ICML Poster Rethinking the Stability-Plasticity Trade-off in Continual Learning from an Architectural Perspective

Poster

Rethinking the Stability-Plasticity Trade-off in Continual Learning from an Architectural Perspective

Aojun Lu · Hangjie Yuan · Tao Feng · Yanan Sun

East Exhibition Hall A-B #E-3507

[ Abstract ] [ Lay Summary ]

[ OpenReview]

Tue 15 Jul 11 a.m. PDT — 1:30 p.m. PDT

Abstract:

The quest for Continual Learning (CL) seeks to empower neural networks with the ability to learn and adapt incrementally. Central to this pursuit is addressing the stability-plasticity dilemma, which involves striking a balance between two conflicting objectives: preserving previously learned knowledge and acquiring new knowledge. While numerous CL methods aim to achieve this trade-off, they often overlook the impact of network architecture on stability and plasticity, restricting the trade-off to the parameter level. In this paper, we delve into the conflict between stability and plasticity at the architectural level. We reveal that under an equal parameter constraint, deeper networks exhibit better plasticity, while wider networks are characterized by superior stability. To address this architectural-level dilemma, we introduce a novel framework denoted Dual-Arch, which serves as a plug-in component for CL. This framework leverages the complementary strengths of two distinct and independent networks: one dedicated to plasticity and the other to stability. Each network is designed with a specialized and lightweight architecture, tailored to its respective objective. Extensive experiments demonstrate that Dual-Arch enhances the performance of existing CL methods while being up to 87% more compact in terms of parameters.

Lay Summary:

Continual learning enables AI systems to continuously acquire and update knowledge, but balancing the plasticity (learn new tasks) and stability (retain old knowledge) remains a major challenge. Traditional methods focus on optimizing parameters but overlook how the design of the AI model's architecture affects this balance. We discovered that deeper AI models excel at learning new tasks, while wider models are better at retaining old knowledge. To leverage both strengths, we introduced Dual-Arch, a framework that combines two specialized models: one deep and narrow for plasticity and another wide and shallow for stability. Our method improves continual learning performance across multiple benchmarks with smaller model size. This advancement could benefit AI applications like personal assistants or robotics, where systems must continuously adapt without losing prior skills.

Chat is not available.