ICML Poster A Cognac Shot To Forget Bad Memories: Corrective Unlearning for Graph Neural Networks

Poster

A Cognac Shot To Forget Bad Memories: Corrective Unlearning for Graph Neural Networks

Varshita Kolipaka · Akshit Sinha · Debangan Mishra · Sumit Kumar · Arvindh Arun · Shashwat Goel · Ponnurangam Kumaraguru

East Exhibition Hall A-B #E-3004

[ Abstract ] [ Lay Summary ] [ Project Page ]

[ OpenReview]

Tue 15 Jul 11 a.m. PDT — 1:30 p.m. PDT

Abstract: Graph Neural Networks (GNNs) are increasingly being used for a variety of ML applications on graph data. Because graph data does not follow the independently and identically distributed *i.i.d.* assumption, adversarial manipulations or incorrect data can propagate to other data points through message passing, which deteriorates the model's performance. To allow model developers to remove the adverse effects of manipulated entities from a trained GNN, we study the recently formulated problem of *Corrective Unlearning*. We find that current graph unlearning methods fail to unlearn the effect of manipulations even when the whole manipulated set is known. We introduce a new graph unlearning method,**Cognac**, which can unlearn the effect of the manipulation set even when only $5$% of it is identified. It recovers most of the performance of a strong oracle with fully corrected training data, even beating retraining from scratch without the deletion set, and is $8$x more efficient while also scaling to large datasets. We hope our work assists GNN developers in mitigating harmful effects caused by issues in real-world data, post-training.

Lay Summary:

Graph Neural Networks (GNNs) are a class of machine learning models designed to learn from structured, interconnected data, such as social networks or molecular structures. A significant challenge arises with graph data—as the data points are not independent, erroneous or manipulated entries can propagate their negative effects throughout the network, degrading the model's overall performance. To add to the problems, we can never be sure if we have identified all erroneous data points.We study—Is there an efficient way to remove the harmful effects of such samples when we only know a handful of them? To tackle this problem, we introduce Cognac. Our method is intuitive—we push the internal representations of manipulated nodes away from those of their neighbors to reduce the influence of the manipulation. At the same time, we ensure that the model maintains performance on the rest of the dataset.Cognac can restore a model's performance even when only a small fraction, as little as 5%, of the problematic data is identified. It is highly effective, recovering performance to a level comparable to a model trained on fully corrected data. Furthermore, Cognac is approximately 8 times more efficient than the standard approach of completely retraining the system. This work provides a valuable tool for developers to mitigate the adverse effects of imperfect, real-world data in trained GNNs.

Chat is not available.