ICML Poster Zero Shot Generalization of Vision-Based RL Without Data Augmentation

Poster

Zero Shot Generalization of Vision-Based RL Without Data Augmentation

Sumeet Batra · Gaurav Sukhatme

West Exhibition Hall B2-B3 #W-821

[ Abstract ] [ Lay Summary ]

[ OpenReview]

Wed 16 Jul 4:30 p.m. PDT — 7 p.m. PDT

Abstract:

Generalizing vision-based reinforcement learning (RL) agents to novel environments remains a difficult and open challenge. Current trends are to collect large-scale datasets or use data augmentation techniques to prevent overfitting and improve downstream generalization. However, the computational and data collection costs increase exponentially with the number of task variations and can destabilize the already difficult task of training RL agents. In this work, we take inspiration from recent advances in computational neuroscience and propose a model, Associative Latent DisentAnglement (ALDA), that builds on standard off-policy RL towards zero-shot generalization. Specifically, we revisit the role of latent disentanglement in RL and show how combining it with a model of associative memory achieves zero-shot generalization on difficult task variations without relying on data augmentation. Finally, we formally show that data augmentation techniques are a form of weak disentanglement and discuss the implications of this insight.

Lay Summary:

Humans and other mammals have shown a remarkable ability to adapt to new situations, thanks to their robust visual systems. Modern neuroscience hypothesizes that this is because we can decompose what we see into independent components and relate them to things we have seen before. For example, if shown a cartoon rendering of a human, because the human has two legs, arms, a torso, head, etc., most of us would understand that this cartoon is meant to represent a real human, even if we had never seen that specific cartoon image before. Inspired by this capability, we designed an artificial agent to decompose a visual scene into independent components and relate them to objects the agent has seen before, so that it can solve tasks even in novel environments i.e. where the colors, background, texture of objects, etc. are different. This allows our agent to generalize to new environments it has never seen without requiring additional training data.

Chat is not available.