Skip to yearly menu bar Skip to main content


Poster

Identifiable Object Representations under Spatial Ambiguities

Avinash Kori · Francesca Toni · Ben Glocker

East Exhibition Hall A-B #E-3106
[ ] [ ]
Tue 15 Jul 4:30 p.m. PDT — 7 p.m. PDT

Abstract:

Modular object-centric representations are essential for human-like reasoning but are challenging to obtain under spatial ambiguities, e.g. due to occlusions and view ambiguities. However, addressing challenges presents both theoretical and practical difficulties. We introduce a novel multi-view probabilistic approach that aggregates view-specific slots to capture invariant content information while simultaneously learning disentangled global viewpoint-level information. Unlike prior single-view methods, our approach resolves spatial ambiguities, provides theoretical guarantees for identifiability, and requires no viewpoint annotations. Extensive experiments on standard benchmarks and novel complex datasets validate our method's robustness and scalability.

Lay Summary:

Object-centric learning focuses on extracting distinct representations for individual objects within a scene, as opposed to learning a single global representation for the entire scene. A key challenge arises when objects are only partially visible or the scene is viewed from oblique or obscure angles—issues collectively referred to as spatial ambiguities. In this paper, we propose a method specifically designed to address these ambiguities. Our approach involves observing a given scene from multiple viewpoints and leveraging the resulting perspectives to correlate and integrate object-specific features, thereby producing a unified, viewpoint-invariant representation for each object. We provide both theoretical justification and empirical evidence demonstrating that this multi-view correlation strategy yields more robust and reliable object representations.

Chat is not available.