Poster
An Efficient Pruner for Large Language Model with Theoretical Guarantee
Canhong Wen · Yihong Zuo · Wenliang Pan
East Exhibition Hall A-B #E-2904
Large Language Models (LLMs), like ChatGPT, have shown incredible capabilities in language understanding and generation. However, they come with a major drawback: their enormous size, which makes them slow, expensive, and difficult to use on many devices. To address this, researchers often use pruning — removing parts of the model that seem less important — to reduce size while maintaining performance. But common pruning methods can be either inefficient or based on heuristic strategies with little mathematical justification.In our work, we introduce a new pruning method with strong theoretical backing. We treat pruning as a mathematical problem that balances performance and simplicity, and solve it using a technique called monotone accelerated Iterative Hard Thresholding (mAIHT). Unlike many existing methods, ours comes with rigorous proofs showing it works reliably and efficiently. We also test it extensively on popular open-sourced LLMs, showing that our approach removes unnecessary parts better than leading pruning methods, all while preserving the model’s abilities.This research helps make LLMs faster, cheaper, and more accessible without sacrificing much intelligence.