ICML Poster Probabilistic Group Mask Guided Discrete Optimization for Incremental Learning

Poster

Probabilistic Group Mask Guided Discrete Optimization for Incremental Learning

Fengqiang Wan · Yang Yang

East Exhibition Hall A-B #E-1207

[ Abstract ] [ Lay Summary ]

[ OpenReview]

Thu 17 Jul 4:30 p.m. PDT — 7 p.m. PDT

Abstract:

Incremental learning (IL) aims to sequentially learn new tasks while mitigating catastrophic forgetting. Among various IL strategies, parameter-isolation methods stand out by using mask techniques to allocate distinct parameters to each task, explicitly addressing forgetting. However, existing approaches often disregard parameter dependencies, resulting in an over-reliance on newly allocated parameters. To address this issue, we propose Probabilistic Group Mask selection (PGM), a group-wise approach that captures parameter dependencies by exploring candidate masks within each group. Specifically, PGM partitions parameters into groups with multiple candidate masks, assigning probabilities to these masks and leveraging Gumbel-Softmax for differentiable sampling, enabling efficient optimization of the discrete mask selection process. Our theoretical analysis demonstrates that incorporating parameter dependencies enhances sub-network selection. Experiments conducted on standard benchmarks confirm its superior effectiveness compared to existing IL approaches. The source code is available at: \url{https://github.com/njustkmg/ICML25-PGM}.

Lay Summary:

When computers learn a series of tasks, they often forget earlier ones — a problem known as “catastrophic forgetting.” One common strategy to avoid this is to assign different parts of the model to different tasks, like giving each task its own space in the model’s memory. However, most of these approaches ignore the relationships between different parts of the model, leading to inefficient use of its capacity. To solve this, we propose a new method called Probabilistic Group Masking (PGM). It groups model parameters and explores multiple options within each group, using a technique that allows the computer to gradually learn which combination works best. This way, the model can better coordinate how it shares and separates knowledge across tasks. Our theoretical analysis shows that modeling these parameter relationships leads to better task-specific configurations.

Chat is not available.