Poster
Grokking in the Wild: Data Augmentation for Real-World Multi-Hop Reasoning with Transformers
Roman Abramov · Felix Steinbauer · Gjergji Kasneci
East Exhibition Hall A-B #E-2809
Many real-world questions require linking facts across multiple sources, like determining a person’s birth year by first identifying their spouse and then tracing their birth date. Current AI models struggle with this because they tend to memorize isolated facts without understanding how they connect.We tackled this by adding synthetic links to the training data of language models. These artificial connections aren’t always factual, but they force the models to focus on linking facts instead of just memorizing them. Surprisingly, the "fake" connections don’t harm the models; instead, they push them to develop their own reasoning pathways.Our experiments showed that this approach helps AI models go beyond memorization, enabling them to answer complex, multi-step questions with impressive accuracy. This could lead to more powerful and trustworthy AI systems in many different fields.