Poster
in
Workshop: Scaling Up Intervention Models
Keep the Alignment, Skip the Overhead: Lightweight Instruction Alignment for Continually Trained LLMs
Ishan Jindal · Badrinath chandana · Pranjal Bharti · Lakkidi Vinay · SACHIN SHARMA
Instruction fine-tuning aligns language models with human intent but is computationally costly. Continuous pretraining on domain-specific data, while effective for adaptation, can degrade instruction-following capabilities. We introduce instruction residuals—the parameter delta between an instruction-tuned model and its base model—as a lightweight mechanism to recover instruction alignment post adaptation. Instruction residuals can be transferred across checkpoints within the same model family, enabling restoration of instruction-following behavior without full retraining. We evaluate our method on LLaMa and Qwen models under domain shifts of up to 1B tokens, showing that instruction residuals effectively preserve alignment while allowing continual domain learning. Our results establish a practical framework for modular, compute-efficient instruction retention in evolving language models.