Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Scaling Up Intervention Models

Keep the Alignment, Skip the Overhead: Lightweight Instruction Alignment for Continually Trained LLMs

Ishan Jindal · Badrinath chandana · Pranjal Bharti · Lakkidi Vinay · SACHIN SHARMA


Abstract:

Instruction fine-tuning aligns language models with human intent but is computationally costly. Continuous pretraining on domain-specific data, while effective for adaptation, can degrade instruction-following capabilities. We introduce instruction residuals—the parameter delta between an instruction-tuned model and its base model—as a lightweight mechanism to recover instruction alignment post adaptation. Instruction residuals can be transferred across checkpoints within the same model family, enabling restoration of instruction-following behavior without full retraining. We evaluate our method on LLaMa and Qwen models under domain shifts of up to 1B tokens, showing that instruction residuals effectively preserve alignment while allowing continual domain learning. Our results establish a practical framework for modular, compute-efficient instruction retention in evolving language models.

Chat is not available.