Poster
in
Workshop: 2nd Workshop on Test-Time Adaptation: Putting Updates to the Test (PUT)
Value Conditioned Policy Fine Tuning for Test Time Domain Adaptation
Harit Pandya · Ignas Budvytis · Rudra Poudel · Stephan Liwicki
Rapid cross-domain adaptation of learned policies is a key enabler for efficient robot deployment to new environments. Especially sim-to-real transfer remains a core challenge in reinforcement learning (RL), due to the unavoidable difference in world dynamics. While na\"ive policy updates with fine-tuning are unstable due to noisy gradients under domain shifts, other methods typically learn a new policy from scratch, relying on data points from the source and target domains using selective data sharing or reward shaping. However, neither approach is suitable for time-efficient policy adaptation or adaptation without access to an efficient simulator during deployment. On the other hand, we propose a value conditioned policy fine tuning that leverages the existing Q-function to estimate trust regions for a stable policy update. In practice, this can be achieved simply by combining gradients from the pre-trained and current Q-functions. We conduct extensive experiments on the MuJoCo dynamics adaptation benchmark for online adaptation, demonstrating competitive performance compared to existing state-of-the-art methods with over 3.5x times faster runtime.