ICML Bayes optimal learning of attention-indexed models

Poster
in
Workshop: 3rd Workshop on High-dimensional Learning Dynamics (HiLD)

Bayes optimal learning of attention-indexed models

Fabrizio Boncoraglio · Emanuele Troiani · Vittorio Erba · Lenka Zdeborova

[ Abstract ] [ Project Page ]

[ OpenReview]

presentation: 3rd Workshop on High-dimensional Learning Dynamics (HiLD)
Fri 18 Jul 9 a.m. PDT — 5:30 p.m. PDT

Abstract:

We introduce the Attention-Indexed Model (AIM), a theoretical framework for analyzing learning in deep attention layers. Inspired by multi-index models, AIM captures how token-level outputs emerge from layered bilinear interactions over high-dimensional embeddings. Unlike prior tractable attention models, AIM allows full-rank key and query matrices, aligning more closely with practical transformers. Using tools from statistical mechanics and random matrix theory, we derive closed-form predictions for Bayes-optimal generalization error and identify sharp phase transitions as a function of sample complexity, model width, and sequence length. We propose a matching Approximate Message Passing algorithm and show that gradient descent can reach optimal performance. AIM offers a solvable playground for understanding learning in modern attention architectures.

Chat is not available.

Poster in Workshop: 3rd Workshop on High-dimensional Learning Dynamics (HiLD)

Bayes optimal learning of attention-indexed models

Fabrizio Boncoraglio · Emanuele Troiani · Vittorio Erba · Lenka Zdeborova

Poster
in
Workshop: 3rd Workshop on High-dimensional Learning Dynamics (HiLD)