ICML Poster IBCircuit: Towards Holistic Circuit Discovery with Information Bottleneck

Poster

IBCircuit: Towards Holistic Circuit Discovery with Information Bottleneck

Tian Bian · Yifan Niu · Chaohao Yuan · Chengzhi Piao · Bingzhe Wu · Long-Kai Huang · Yu Rong · Tingyang Xu · Hong Cheng · Jia Li

East Exhibition Hall A-B #E-2402

[ Abstract ] [ Lay Summary ]

[ Poster] [ OpenReview]

Tue 15 Jul 4:30 p.m. PDT — 7 p.m. PDT

Abstract:

Circuit discovery has recently attracted attention as a potential research direction to explain the non-trivial behaviors of language models. It aims to find the computational subgraphs, also known as circuits, within the model that are responsible for solving specific tasks. However, most existing studies overlook the holistic nature of these circuits and require designing specific corrupted activations for different tasks, which is inaccurate and inefficient. In this work, we propose an end-to-end approach based on the principle of Information Bottleneck, called IBCircuit, to holistically identify informative circuits. In contrast to traditional causal interventions, IBCircuit is an optimization framework for holistic circuit discovery and can be applied to any given task without tediously corrupted activation design. In both the Indirect Object Identification (IOI) and Greater-Than tasks, IBCircuit identifies more faithful and minimal circuits in terms of critical node components and edge components compared to recent related work.

Lay Summary:

Modern language models, like those powering chatbots and translation services, are incredibly complex and often act as "black boxes," making it hard to understand how they perform specific tasks. Researchers want to uncover the hidden pathways, or "circuits," inside these models that handle particular functions, such as identifying the subject in a sentence or solving math problems. However, existing methods to find these circuits are often cumbersome and inefficient because they require designing specific tests for each task.To address this, the authors of the paper introduce IBCircuit, a new method inspired by the Information Bottleneck principle—a technique that helps filter out unnecessary information while keeping the important details. IBCircuit automates the discovery of these critical circuits in a holistic way, meaning it looks at the entire model rather than isolated parts. This approach eliminates the need for manually creating task-specific tests, making the process faster and more accurate.Through experiments, IBCircuit successfully identified the essential circuits in language models that handle tasks like understanding sentence structure and performing arithmetic. Compared to previous methods, IBCircuit found circuits that are both simpler and more reliable, enhancing our ability to interpret and trust these powerful AI systems. Ultimately, this advancement helps bridge the gap between complex AI models and human understanding, paving the way for more transparent and dependable artificial intelligence.

Chat is not available.