ICML Input differentiation via negative computation

Poster
in
Workshop: 3rd Workshop on High-dimensional Learning Dynamics (HiLD)

Input differentiation via negative computation

Linghao Kong · Angelina Ning · Nir Shavit

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract:

Understanding neuronal mechanisms in large language models remains challenging, particularly due to polysemanticity and superposition. In this work, we further investigate the previously identified "Wasserstein neurons," characterized by non-Gaussian pre-activation distributions. Our analysis reveals that these neurons are more prevalent and exhibit faster learning dynamics in larger models. Critically, we demonstrate for the first time the mechanistic significance of the negative activation space, showing that Wasserstein neurons leverage negative pre-activations for nuanced input differentiation, especially regarding syntactic and structural tokens. Ablation experiments confirm that constraining negative activations significantly degrades model performance, highlighting previously underappreciated computational roles. These findings offer new directions for interpretability research by emphasizing the importance of negative computation.

Chat is not available.

Poster in Workshop: 3rd Workshop on High-dimensional Learning Dynamics (HiLD)

Input differentiation via negative computation

Linghao Kong · Angelina Ning · Nir Shavit

Poster
in
Workshop: 3rd Workshop on High-dimensional Learning Dynamics (HiLD)