Poster
DA-KD: Difficulty-Aware Knowledge Distillation for Efficient Large Language Models
Changyi He · Yifu Ding · Jinyang Guo · Ruihao Gong · Haotong Qin · Xianglong Liu
East Exhibition Hall A-B #E-2507
Large language models are powerful but slow and expensive to train. One way to make them faster is to teach smaller models to copy the behavior of large ones—a process called distillation. But most methods waste time on examples that are already easy for the small model to learn.We propose a smarter method that focuses only on the hard examples the small model struggles with. It also uses a better way to guide the learning process, so the model trains more smoothly and effectively.Our approach builds smaller models that are just as good—or even better—than the large ones, while using much less time and computing power. This makes it easier to use advanced language models in everyday applications.