Skip to yearly menu bar Skip to main content


Poster

Noise-Guided Predicate Representation Extraction and Diffusion-Enhanced Discretization for Scene Graph Generation

Guoqing Zhang · Shichao Kan · Fanghui Zhang · Wanru Xu · Yue Zhang · Yigang Cen

East Exhibition Hall A-B #E-3613
[ ] [ ]
Thu 17 Jul 11 a.m. PDT — 1:30 p.m. PDT

Abstract:

Scene Graph Generation (SGG) is a fundamental task in visual understanding, aimed at providing more precise local detail comprehension for downstream applications. Existing SGG methods often overlook the diversity of predicate representations and the consistency among similar predicates when dealing with long-tail distributions. As a result, the model's decision layer fails to effectively capture details from the tail end, leading to biased predictions. To address this, we propose a Noise-Guided Predicate Representation Extraction and Diffusion-Enhanced Discretization (NoDIS) method. On the one hand, expanding the predicate representation space enhances the model's ability to learn both common and rare predicates, thus reducing prediction bias caused by data scarcity. We propose a conditional diffusion model to reconstructs features and increase the diversity of representations for same category predicates. On the other hand, independent predicate representations in the decision phase increase the learning complexity of the decision layer, making accurate predictions more challenging. To address this issue, we introduce a discretization mapper that learns consistent representations among similar predicates, reducing the learning difficulty and decision ambiguity in the decision layer. To validate the effectiveness of our method, we integrate NoDIS with various SGG baseline models and conduct experiments on multiple datasets. The results consistently demonstrate superior performance.

Lay Summary:

To address the widely existing issue of data imbalance (long-tailed distribution), we propose a feature enhancement method based on diffusion models and discretization mapping. This approach leverages the generative capability of diffusion models to perform online feature augmentation, while the discretization mapping aggregates representations of semantically similar predicates to alleviate decision-level pressure. When applied to the scene graph generation task, our method effectively mitigates biased predictions caused by long-tailed distributions and achieves strong performance across multiple datasets.

Chat is not available.