ICML Poster Avoiding Leakage Poisoning: Concept Interventions Under Distribution Shifts

Poster

Avoiding Leakage Poisoning: Concept Interventions Under Distribution Shifts

Mateo Espinosa Zarlenga · Gabriele Dominici · Pietro Barbiero · Zohreh Shams · Mateja Jamnik

East Exhibition Hall A-B #E-1102

[ Abstract ] [ Lay Summary ] [ Project Page ]

[ Poster] [ OpenReview]

Thu 17 Jul 11 a.m. PDT — 1:30 p.m. PDT

Abstract:

In this paper, we investigate how concept-based models (CMs) respond to out-of-distribution (OOD) inputs. CMs are interpretable neural architectures that first predict a set of high-level concepts (e.g., "stripes", "black") and then predict a task label from those concepts. In particular, we study the impact of concept interventions (i.e., operations where a human expert corrects a CM’s mispredicted concepts at test time) on CMs' task predictions when inputs are OOD. Our analysis reveals a weakness in current state-of-the-art CMs, which we term leakage poisoning, that prevents them from properly improving their accuracy when intervened on for OOD inputs. To address this, we introduce MixCEM, a new CM that learns to dynamically exploit leaked information missing from its concepts only when this information is in-distribution. Our results across tasks with and without complete sets of concept annotations demonstrate that MixCEMs outperform strong baselines by significantly improving their accuracy for both in-distribution and OOD samples in the presence and absence of concept interventions.

Lay Summary:

Recent advances in Artificial Intelligence (AI) have led to powerful models that can receive help in the form of "concept interventions". A concept intervention is an operation where, during deployment, an expert communicates the presence or absence of a high-level concept in the model's input through a manipulation of its inner representations. This way, for example, radiologists can let an AI assistant know that an X-ray scan has "bone spurs", helping the assistant make a more accurate diagnosis.The real world, however, is messy. This means that the inputs we provide to models contain noise or conditions that differ from those the model was exposed to during training. In this paper, we demonstrate that in these instances, concept interventions fail to properly aid the model in its downstream task. We argue that this is due to "leakage poisoning", where a model's representations become too corrupted for interventions to work.We address this by proposing a way of representing concepts that enables the model to restrict this poisonous leakage whenever the input goes too far from what the model has been exposed to. Our results show that our representations lead to highly accurate models that remain intervenable when provided with expected and unexpected inputs.

Chat is not available.