Poster
Fundamental limits of learning in sequence multi-index models and deep attention networks: high-dimensional asymptotics and sharp thresholds
Emanuele Troiani · Hugo Cui · Yatin Dandi · FLORENT KRZAKALA · Lenka Zdeborová
West Exhibition Hall B2-B3 #W-801
Modern AI systems often rely on attention networks — models that look at relationships between different parts of the input. These are central to breakthroughs like large language models. But how do such complex networks actually learn, and how much data do they need to start making accurate predictions?We studied deep attention networks in a simplified, low-rank setting, allowing us to derive exact results on the fundamental limits of learning from data. Our analysis provides a comprehensive mathematical framework to understand when learning is possible and how well these models can perform.One of our most surprising findings is that the network doesn’t learn all at once: instead, its layers are learned sequentially. This means the model gradually builds up complexity, layer by layer — a behavior we also observe in some realistic scenarios.