ICML Poster Probabilistic Interactive 3D Segmentation with Hierarchical Neural Processes

Poster

Probabilistic Interactive 3D Segmentation with Hierarchical Neural Processes

Jie Liu · Pan Zhou · Zehao Xiao · Jiayi Shen · Wenzhe Yin · Jan-jakob Sonke · Efstratios Gavves

West Exhibition Hall B2-B3 #W-218

[ Abstract ] [ Lay Summary ]

[ Poster] [ OpenReview]

Wed 16 Jul 4:30 p.m. PDT — 7 p.m. PDT

Abstract:

Interactive 3D segmentation has emerged as a promising solution for generating accurate object masks in complex 3D scenes by incorporating user-provided clicks. However, two critical challenges remain underexplored: (1) effectively generalizing from sparse user clicks to produce accurate segmentations and (2) quantifying predictive uncertainty to help users identify unreliable regions. In this work, we propose \emph{NPISeg3D}, a novel probabilistic framework that builds upon Neural Processes (NPs) to address these challenges. Specifically, NPISeg3D introduces a hierarchical latent variable structure with scene-specific and object-specific latent variables to enhance few-shot generalization by capturing both global context and object-specific characteristics. Additionally, we design a probabilistic prototype modulator that adaptively modulates click prototypes with object-specific latent variables, improving the model’s ability to capture object-aware context and quantify predictive uncertainty. Experiments on four 3D point cloud datasets demonstrate that NPISeg3D achieves superior segmentation performance with fewer clicks while providing reliable uncertainty estimations.

Lay Summary:

Teaching a computer to recognize 3D objects — like chairs, tables, or people in a cluttered room — is challenging, especially when only a few user clicks are provided as guidance. Our research explores how to make this process both smarter and more trustworthy.We developed a method called NPISeg3D that learns how to segment objects in 3D environments with just a small number of clicks from the user. It works by building a kind of "mental model" of each scene and object, helping the system understand both the big picture and the details. This allows it to make better guesses, even with little input.But we didn’t stop there — we also taught the system to know when it might be wrong. This helps users spot and fix mistakes easily. Our method shows strong results across multiple 3D datasets, providing more accurate segmentations with fewer clicks — and more confidence in the results.

Chat is not available.