Poster
Quamba2: A Robust and Scalable Post-training Quantization Framework for Selective State Space Models
Hung-Yueh Chiang · Chi-Chih Chang · Natalia Frumkin · Kai-Chiang Wu · Mohamed Abdelfattah · Diana Marculescu
East Exhibition Hall A-B #E-2700
Large AI models are powerful but often too big and slow to run efficiently on everyday devices or even in the cloud. State Space Models (SSMs) are a newer type of AI model that use memory more efficiently than the popular Transformer models, making them a promising option. However, running these models quickly and on different hardware remains a challenge.Our work introduces Quamba2, a method that makes these models smaller and faster by converting their numbers into simpler, lower-precision formats. This helps the models run better on everything from cloud servers to personal laptops, depending on the task. Quamba2 supports several precision levels, so it can balance speed and accuracy depending on where it’s used.We tested Quamba2 on large models and found it could cut memory use by up to 4× and speed up responses significantly, with only a small drop in performance. This brings us closer to making powerful AI models work smoothly across a wide range of platforms. Our code and models will be shared with the community.