ICML Poster LieRE: Lie Rotational Positional Encodings

Poster

LieRE: Lie Rotational Positional Encodings

Sophie Ostmeier · Brian Axelrod · Maya Varma · Michael Moseley · Akshay Chaudhari · Curtis Langlotz

West Exhibition Hall B2-B3 #W-308

[ Abstract ] [ Lay Summary ]

[ Poster] [ OpenReview]

Wed 16 Jul 4:30 p.m. PDT — 7 p.m. PDT

Abstract:

Transformer architectures depend on explicit position encodings to capture token positional information. Rotary Position Encoding (RoPE) has emerged as a popular choice in language models due to its efficient encoding of relative position information through key-query rotations. However, RoPE faces significant limitations beyond language processing: it is constrained to one-dimensional sequence data and, even with learnable phases, offers limited representational capacity.We address these challenges with Lie Relative Encodings (LieRE), which generalizes RoPE to high-dimensional rotation matrices by leveraging their Lie group structure. Through extensive evaluation on three image datasets across 2D and 3D classification tasks, LieRE achieves 1.5% improvement over state-of-the-art baselines on 2D tasks and 1% on 3D tasks, while demonstrating superior generalization to higher resolutions. Our implementation is computationally efficient, with results reproducible on 4 A100 GPUs in 30 minutes on CIFAR100. Our code is available at https://github.com/StanfordMIMI/LieRE.

Lay Summary:

Transformers are widely used in AI, but they need position information to understand the structure of data like images or 3D scenes. For example, when processing an image, it is divided into small blocks called patches. Each patch becomes a token, but these tokens lack information about their original location unless position encodings are added.LieRE improves how transformers handle position encodings, especially in high-dimensional settings. It encodes token positions using rotation matrices, leveraging the relationship between skew-symmetric matrices and dense rotation matrices to preserve geometric structure.LieRE builds on RoPE (Rotary Position Embedding), which uses block-2D rotation matrices to encode relative positions directly into model computations. While RoPE was originally developed for one-dimensional sequence data like text, LieRE generalizes these ideas to higher dimensions, allowing the model to reason about both absolute and relative positions in complex spatial data.

Chat is not available.