ICML Poster Geometric Feature Embedding for Effective 3D Few-Shot Class Incremental Learning

Poster

Geometric Feature Embedding for Effective 3D Few-Shot Class Incremental Learning

Xiangqi Li · Libo Huang · Zhulin An · Weilun Feng · Chuanguang Yang · Boyu Diao · Fei Wang · Yongjun Xu

West Exhibition Hall B2-B3 #W-217

[ Abstract ] [ Lay Summary ]

[ Poster] [ OpenReview]

Wed 16 Jul 11 a.m. PDT — 1:30 p.m. PDT

Abstract:

3D few-shot class incremental learning (FSCIL) aims to learn new point cloud categories from limited samples while preventing the forgetting of previously learned categories. This research area significantly enhances the capabilities of self-driving vehicles and computer vision systems. Existing 3D FSCIL approaches primarily utilize multimodal pre-trained models to extract the semantic features, heavily dependent on meticulously designed high-quality prompts and fine-tuning strategies. To reduce this dependence, this paper proposes a novel method for 3D FSCIL with Embedded Geometric features (3D-FLEG). Specifically, 3D-FLEG develops a point cloud geometric feature extraction module to capture category-related geometric characteristics. To address the modality heterogeneity issues that arise from integrating geometric and text features, 3D-FLEG introduces a geometric feature embedding module. By augmenting text prompts with spatial geometric features through these modules, 3D-FLEG can learn robust representations of new categories even with limited samples, while mitigating forgetting of the previously learned categories. Experiments conducted on several publicly available 3D point cloud datasets, including ModelNet, ShapeNet, ScanObjectNN, and CO3D, demonstrate 3D-FLEG's superiority over existing state-of-the-art 3D FSCIL methods. Code is available at https://github.com/lixiangqi707/3D-FLEG.

Lay Summary:

3D Few-Shot Class Incremental Learning (FSCIL) is an emerging research area with significant potential to enhance the adaptability and perception capabilities of autonomous systems and computer vision applications. Current approaches predominantly rely on multimodal pre-trained models for semantic feature extraction, which often require carefully designed prompts and elaborate fine-tuning strategies. In this paper, we propose a novel framework named 3D-FLEG, aiming to reduce this dependency. 3D-FLEG integrates geometric features with textual prompts through a geometric feature extraction and embedding module, effectively leveraging the spatial structure of 3D data. This approach significantly alleviates the reliance on complex textual descriptions, particularly in few-shot settings, enabling robust recognition of novel classes while mitigating catastrophic forgetting. Experimental results demonstrate that our method achieves state-of-the-art performance on both within-dataset and cross-dataset 3D FSCIL benchmarks.

Chat is not available.