Poster
Grokking Beyond the Euclidean Norm of Model Parameters
Pascal Jr Tikeng Notsawo · Guillaume Dumas · Guillaume Rabusseau
West Exhibition Hall B2-B3 #W-811
Sometimes when learning, children do not seem to understand something at first — they simply mimic what they see. But after enough repetition, something clicks: they suddenly "get it" and can apply the idea in new situations. The same thing can happen with artificial intelligence (AI). AI models often start by memorizing the training examples. Yet, after a surprisingly long time, they begin to understand the underlying patterns and solve problems they have never seen before. This sudden shift is called grokking.Our research investigates why grokking happens and how to influence it. We find that it is not just about the model’s architecture — grokking also depends on the kind of simplicity (regularization) enforced during training, such as using fewer connections (sparsity) or a simpler internal structure (low-rankness). In some cases, we even show that grokking is necessary for a model to reach an optimal solution. However, when simplicity is enforced, training can take significantly longer. There is a tradeoff: small regularization can improve generalization, but requires more training time. Our results provide a way to manage this tradeoff based on the resources available and the kind of behavior we want from the model.These insights help explain why some AI systems require much more training than expected to reach deep understanding — and how we can guide them more effectively.