Poster
Steerable Transformers for Volumetric Data
Soumyabrata Kundu · Risi Kondor
West Exhibition Hall B2-B3 #W-215
When you rotate a 2-D image or inspect a 3-D scan from another angle, you still recognise what you’re looking at—but most artificial-intelligence systems do not. They either make mistakes or must be shown thousands of extra, rotated examples to learn the same trick. Our research introduces steerable transformers, a new kind of neural-network layer that automatically understands when an object has simply been moved or rotated. We blend two ideas: (1) the “zoom-out and compare” power of transformers—the technology behind tools like ChatGPT—and (2) earlier convolution methods that focus on local image details while respecting how objects can move in space. The resulting model keeps its accuracy even when pictures or volumes are spun around, yet it needs no extra memory or training time. In tests on hand-written digits, 3-D shape models, skin-lesion photos, and brain-tumour MRI scans, steerable transformers outperform the previous rotation-aware networks. Because the method naturally handles full 3-D data, it could help doctors detect tumours, aid self-driving cars in spotting objects from odd angles, and improve any vision task where orientation should not matter.