ICML Poster Steerable Transformers for Volumetric Data

Poster

Steerable Transformers for Volumetric Data

Soumyabrata Kundu · Risi Kondor

West Exhibition Hall B2-B3 #W-215

[ Abstract ] [ Lay Summary ]

[ OpenReview]

Wed 16 Jul 4:30 p.m. PDT — 7 p.m. PDT

Abstract: We introduce Steerable Transformers, an extension of the Vision Transformer mechanism that maintains equivariance to the special Euclidean group $\mathrm{SE}(d)$. We propose an equivariant attention mechanism that operates on features extracted by steerable convolutions. Operating in Fourier space, our network utilizes Fourier space non-linearities. Our experiments in both two and three dimensions show that adding steerable transformer layers to steerable convolutional networks enhances performance.

Lay Summary:

When you rotate a 2-D image or inspect a 3-D scan from another angle, you still recognise what you’re looking at—but most artificial-intelligence systems do not. They either make mistakes or must be shown thousands of extra, rotated examples to learn the same trick. Our research introduces steerable transformers, a new kind of neural-network layer that automatically understands when an object has simply been moved or rotated. We blend two ideas: (1) the “zoom-out and compare” power of transformers—the technology behind tools like ChatGPT—and (2) earlier convolution methods that focus on local image details while respecting how objects can move in space. The resulting model keeps its accuracy even when pictures or volumes are spun around, yet it needs no extra memory or training time. In tests on hand-written digits, 3-D shape models, skin-lesion photos, and brain-tumour MRI scans, steerable transformers outperform the previous rotation-aware networks. Because the method naturally handles full 3-D data, it could help doctors detect tumours, aid self-driving cars in spotting objects from odd angles, and improve any vision task where orientation should not matter.

Chat is not available.