ICML Poster Explaining, Fast and Slow: Abstraction and Refinement of Provable Explanations

Poster

Explaining, Fast and Slow: Abstraction and Refinement of Provable Explanations

Shahaf Bassan · Yizhak Elboher · Tobias Ladner · Matthias Althoff · Guy Katz

East Exhibition Hall A-B #E-1204

[ Abstract ] [ Lay Summary ]

[ Slides] [ OpenReview]

Tue 15 Jul 4:30 p.m. PDT — 7 p.m. PDT

Abstract:

Despite significant advancements in post-hoc explainability techniques for neural networks, many current methods rely on heuristics and do not provide formally provable guarantees over the explanations provided. Recent work has shown that it is possible to obtain explanations with formal guarantees by identifying subsets of input features that are sufficient to determine that predictions remain unchangedusing neural network verification techniques.Despite the appeal of these explanations, their computation faces significant scalability challenges. In this work, we address this gap by proposing a novel abstraction-refinement technique for efficiently computing provably sufficient explanations of neural network predictions. Our method abstracts the original large neural network by constructing a substantially reduced network, where a sufficient explanation of the reduced network is also provably sufficient for the original network, hence significantly speeding up the verification process. If the explanation is insufficient on the reduced network, we iteratively refine the network size by gradually increasing it until convergence.Our experiments demonstrate that our approach enhances the efficiency of obtaining provably sufficient explanations for neural network predictions while additionally providing a fine-grained interpretation of the network's predictions across different abstraction levels.

Lay Summary:

As neural networks are increasingly used in high-stakes decisions, understanding their predictions is critical. While many explainability methods exist, most rely on heuristics — meaning we can't guarantee that the explanations they produce are accurate. This limits our ability to fully trust or rely on such explanations.Recent research has shown that it's possible to generate provably sufficient explanations — identifying subsets of features that, when fixed, provably guarantee the model’s prediction remains unchanged. These explanations offer strong guarantees, but they’re often too slow to compute on large models.In our work, we tackle this challenge with a new abstraction-refinement algorithm. We first build a smaller, simplified version of the model and search for sufficient explanations there. If successful, we can prove the explanation also holds for the original model. If not, we keep refining the model step by step by increasing its size until we find an explanation that's guaranteed to be both sufficient and minimal.Our results show this method makes formal explanations significantly more scalable — offering both speed and trust in the explanations we provide.

Chat is not available.