Oral
in
Workshop: Tiny Titans: The next wave of On-Device Learning for Foundation Models (TTODLer-FM)
Preserve then Quantize: Dominant-Subspace Guided Low-Rank Reconstruction
Yoonjun Cho · Dongjae Jeon · Soeun Kim · Albert No
Post-training quantization (PTQ) enables efficient deployment of LLMs by converting weights to low-bit formats, but often degrades accuracy. Quantization error reconstruction (QER) mitigates this by adding a low-rank correction term.However, existing QER methods typically quantize weights before identifying low-rank structure, discarding information they later attempt to recover.We propose Structured Residual Reconstruction (SRR), a simple yet effective reformulation of QER that first preserves dominant spectral directions and quantizes only the residual tail.The final approximation combines the preserved low-rank structure with a quantized residual, yielding improved fidelity under the same rank constraint.SRR generalizes to activation-aware settings by selecting dominant components based on contributions in both the original and activation-weighted spaces.We also apply SRR in QPEFT by freezing the preserved subspace and updating only the residual component during fine-tuning, which stabilizes training and leads to better adaptation.Across both PTQ and QPEFT, SRR consistently improves performance under fixed rank constraints, providing an effective framework for quantization-aware compression.