ICML Poster Task-Aware Virtual Training: Enhancing Generalization in Meta-Reinforcement Learning for Out-of-Distribution Tasks

Poster

Task-Aware Virtual Training: Enhancing Generalization in Meta-Reinforcement Learning for Out-of-Distribution Tasks

Jeongmo Kim · Yisak Park · Minung Kim · Seungyul Han

West Exhibition Hall B2-B3 #W-700

[ Abstract ] [ Lay Summary ]

[ Slides] [ Poster] [ OpenReview]

Thu 17 Jul 11 a.m. PDT — 1:30 p.m. PDT

Abstract:

Meta reinforcement learning aims to develop policies that generalize to unseen tasks sampled from a task distribution. While context-based meta-RL methods improve task representation using task latents, they often struggle with out-of-distribution (OOD) tasks. To address this, we propose Task-Aware Virtual Training (TAVT), a novel algorithm that accurately captures task characteristics for both training and OOD scenarios using metric-based representation learning. Our method successfully preserves task characteristics in virtual tasks and employs a state regularization technique to mitigate overestimation errors in state-varying environments. Numerical results demonstrate that TAVT significantly enhances generalization to OOD tasks across various MuJoCo and MetaWorld environments. Our code is available at https://github.com/JM-Kim-94/tavt.git.

Lay Summary:

Can an agent that has learned only the forward movement move in a different direction? Agents that be applied in real world shuld be able to adapt to changing environments or various tasks. In particular, it is very challenging for agent to adapt to unseen tasks. To overcome this, the TAVT algorithm proposes two main methods. First, an agent learns the training task set based on the newly defined task measurement (Task-Aware structure). Second, it creates virtual tasks based on the learned task set and uses them for agent learning (Virtual Training structure). Due to this virtual training scheme, the agent trained by the TAVT can adapt to unseen tasks that have not actually been experienced.

Chat is not available.