Poster
in
Workshop: 2nd Workshop on Test-Time Adaptation: Putting Updates to the Test (PUT)
Inference-Time Alignment via Hypothesis Reweighting
Yoonho Lee · Jonathan Williams · Henrik Marklund · Archit Sharma · Eric Mitchell · Anikait Singh · Chelsea Finn
Chat assistants must handle diverse and often conflicting user preferences, requiring adaptability to various user needs. We propose a lightweight framework to address the general challenge of aligning models to user intent at inference time. Our approach involves training an efficient ensemble, i.e., a single neural network with multiple prediction heads, each representing a different function consistent with the training data. Our main contribution is HyRe, a simple adaptation technique that dynamically reweights ensemble members at test time using a small set of labeled examples from the target distribution, which can be labeled in advance or actively queried from a larger unlabeled pool. The computational cost of our training procedure is comparable to fine-tuning a single model, and thus scales to large pretrained backbones. We empirically validate HyRe in several target evaluation distributions. With as few as five preference pairs from each target distribution, adaptation via HyRe surpasses state-of-the-art reward models on RewardBench at both the 2B and 8B parameter scales.