Poster
in
Workshop: Methods and Opportunities at Small Scale (MOSS)
Decomposed Learning: An Avenue for Mitigating Grokking
Gabryel Mason-Williams · Israel Mason-Williams
Keywords: [ compression ] [ grokking ] [ SVD ] [ linear algebra ] [ optimisation ]
Abstract:
Grokking is a delayed transition from memorisation to generalisation in neural networks. It challenges perspectives on efficient learning, particularly in structured tasks and small-data regimes. We explore grokking in modular arithmetic from the perspective of a training pathology. We use Singular Value Decomposition (SVD) to modify the weight matrices of neural networks by changing the representation of the weight matrix, $W$, into the product of three matrices, $U$, $\Sigma$ and $V^T$. Through empirical evaluations on the modular addition task, we show that this representation significantly reduces the effect of grokking and, in some cases, eliminates it.
Chat is not available.