ICML Value Conditioned Policy Fine Tuning for Test Time Domain Adaptation

Poster
in
Workshop: 2nd Workshop on Test-Time Adaptation: Putting Updates to the Test (PUT)

Value Conditioned Policy Fine Tuning for Test Time Domain Adaptation

Harit Pandya · Ignas Budvytis · Rudra Poudel · Stephan Liwicki

[ Abstract ] [ Project Page ]

[ OpenReview]

Fri 18 Jul 2:30 p.m. PDT — 3:15 p.m. PDT

Abstract:

Rapid cross-domain adaptation of learned policies is a key enabler for efficient robot deployment to new environments. Especially sim-to-real transfer remains a core challenge in reinforcement learning (RL), due to the unavoidable difference in world dynamics. While na\"ive policy updates with fine-tuning are unstable due to noisy gradients under domain shifts, other methods typically learn a new policy from scratch, relying on data points from the source and target domains using selective data sharing or reward shaping. However, neither approach is suitable for time-efficient policy adaptation or adaptation without access to an efficient simulator during deployment. On the other hand, we propose a value conditioned policy fine tuning that leverages the existing Q-function to estimate trust regions for a stable policy update. In practice, this can be achieved simply by combining gradients from the pre-trained and current Q-functions. We conduct extensive experiments on the MuJoCo dynamics adaptation benchmark for online adaptation, demonstrating competitive performance compared to existing state-of-the-art methods with over 3.5x times faster runtime.

Chat is not available.

Poster in Workshop: 2nd Workshop on Test-Time Adaptation: Putting Updates to the Test (PUT)

Value Conditioned Policy Fine Tuning for Test Time Domain Adaptation

Harit Pandya · Ignas Budvytis · Rudra Poudel · Stephan Liwicki

Poster
in
Workshop: 2nd Workshop on Test-Time Adaptation: Putting Updates to the Test (PUT)