Spotlight Poster
Determining Layer-wise Sparsity for Large Language Models Through a Theoretical Perspective
Weizhong Huang · Yuxin Zhang · Xiawu Zheng · Fei Chao · Rongrong Ji
East Exhibition Hall A-B #E-2406
Large language models like ChatGPT need heavy computing power. Researchers often simplify these models by removing less important components ("sparsification"). But current methods face a hidden problem: errors caused by this simplification accumulate across different layers of the model, like a snowball rolling downhill. These growing errors eventually crash the model’s performance – we call this "error explosion."We discovered a smarter way to simplify these AI systems. Imagine organizing model layers like a musical crescendo – starting with minimal simplification in early layers and gradually increasing it in later ones. This approach prevents error accumulation and only requires adjusting one key parameter. Remarkably, finding the best pattern takes just a few attempts rather than exhaustive testing.Our method makes simplified AI models significantly more accurate and efficient. When applied to a 7B model, it boosted task-solving accuracy by over 10% while making the model 70% leaner and doubled processing speed on both CPUs and GPUs. It also works for image and multimodal AI, enabling compact yet powerful models. For example, it could help run advanced AI assistants on everyday devices instead of energy-hungry servers. This breakthrough balances AI capability with real-world usability, making advanced models faster, greener, and more accessible.