ICML Poster Mitigating Plasticity Loss in Continual Reinforcement Learning by Reducing Churn

Poster

Mitigating Plasticity Loss in Continual Reinforcement Learning by Reducing Churn

Hongyao Tang · Johan Obando-Ceron · Pablo Samuel Castro · Aaron Courville · Glen Berseth

West Exhibition Hall B2-B3 #W-702

[ Abstract ] [ Lay Summary ]

[ Poster] [ OpenReview]

Wed 16 Jul 11 a.m. PDT — 1:30 p.m. PDT

Abstract:

Plasticity, or the ability of an agent to adapt to new tasks, environments, or distributions, is crucial for continual learning. In this paper, we study the loss of plasticity in deep continual RL from the lens of churn: network output variability induced by the data in each training batch. We demonstrate that (1) the loss of plasticity is accompanied by the exacerbation of churn due to the gradual rank decrease of the Neural Tangent Kernel (NTK) matrix; (2) reducing churn helps prevent rank collapse and adjusts the step size of regular RL gradients adaptively. Moreover, we introduce Continual Churn Approximated Reduction (C-CHAIN) and demonstrate it improves learning performance and outperforms baselines in a diverse range of continual learning environments on OpenAI Gym Control, ProcGen, DeepMind Control Suite, and MinAtar benchmarks.

Lay Summary:

Natural intelligent creatures have the ability to learn within their lifetimes, like human beings are able to keep accepting new information and learning new tasks every day. However, this kind of learnability or plasticity is non-trivial for artificial intelligence (AI). Most existing AI achievements are built for doing specific tasks to a near-human or super-human level. The continual learning ability of AI agents is still an open challenge.In this paper, we study the plasticity issue of AI methods for learning a temporal sequence of tasks. We present that one cause of the plasticity issue is the inner unregularized generalization or interference behaviors in AI models. Based on formal analysis and empirical investigation, we propose a regularization method called C-CHAIN to suppress the inner behaviors, which is demonstrated to successfully mitigate the plasticity issue and improve the learnability of AI models in continual learning scenarios.Our work will help to better understand the learning behaviors and issues of AI models. Our method and the idea behind it can be an easy-to-implement choice for continual learning problems. Our findings also indicate the distinction between natural intelligence and AI, from which more inspiration could be drawn to realize AI models of a higher level of intelligence.

Chat is not available.