Skip to yearly menu bar Skip to main content


Poster
in
Workshop: 3rd Workshop on High-dimensional Learning Dynamics (HiLD)

Input differentiation via negative computation

Linghao Kong · Angelina Ning · Nir Shavit


Abstract:

Understanding neuronal mechanisms in large language models remains challenging, particularly due to polysemanticity and superposition. In this work, we further investigate the previously identified "Wasserstein neurons," characterized by non-Gaussian pre-activation distributions. Our analysis reveals that these neurons are more prevalent and exhibit faster learning dynamics in larger models. Critically, we demonstrate for the first time the mechanistic significance of the negative activation space, showing that Wasserstein neurons leverage negative pre-activations for nuanced input differentiation, especially regarding syntactic and structural tokens. Ablation experiments confirm that constraining negative activations significantly degrades model performance, highlighting previously underappreciated computational roles. These findings offer new directions for interpretability research by emphasizing the importance of negative computation.

Chat is not available.