ICML Decomposed Learning: An Avenue for Mitigating Grokking

Poster
in
Workshop: Methods and Opportunities at Small Scale (MOSS)

Decomposed Learning: An Avenue for Mitigating Grokking

Gabryel Mason-Williams · Israel Mason-Williams

Keywords: [ compression ] [ grokking ] [ SVD ] [ linear algebra ] [ optimisation ]

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract: Grokking is a delayed transition from memorisation to generalisation in neural networks. It challenges perspectives on efficient learning, particularly in structured tasks and small-data regimes. We explore grokking in modular arithmetic from the perspective of a training pathology. We use Singular Value Decomposition (SVD) to modify the weight matrices of neural networks by changing the representation of the weight matrix, $W$, into the product of three matrices, $U$, $\Sigma$ and $V^T$. Through empirical evaluations on the modular addition task, we show that this representation significantly reduces the effect of grokking and, in some cases, eliminates it.

Chat is not available.

Poster in Workshop: Methods and Opportunities at Small Scale (MOSS)

Decomposed Learning: An Avenue for Mitigating Grokking

Gabryel Mason-Williams · Israel Mason-Williams

Poster
in
Workshop: Methods and Opportunities at Small Scale (MOSS)