ICML Poster Automatically Identify and Rectify: Robust Deep Contrastive Multi-view Clustering in Noisy Scenarios

Spotlight Poster

Automatically Identify and Rectify: Robust Deep Contrastive Multi-view Clustering in Noisy Scenarios

xihong yang · Siwei Wang · Fangdi Wang · Jiaqi Jin · Suyuan Liu · Yue Liu · En Zhu · Xinwang Liu · Yueming Jin

East Exhibition Hall A-B #E-1901

[ Abstract ] [ Lay Summary ]

[ Poster] [ OpenReview]

Tue 15 Jul 4:30 p.m. PDT — 7 p.m. PDT

Abstract:

Leveraging the powerful representation learning capabilities, deep multi-view clustering methods have demonstrated reliable performance by effectively integrating multi-source information from diverse views in recent years. Most existing methods rely on the assumption of clean views. However, noise is pervasive in real-world scenarios, leading to a significant degradation in performance. To tackle this problem, we propose a novel multi-view clustering framework for the automatic identification and rectification of noisy data, termed AIRMVC. Specifically, we reformulate noisy identification as an anomaly identification problem using GMM. We then design a hybrid rectification strategy to mitigate the adverse effects of noisy data based on the identification results. Furthermore, we introduce a noise-robust contrastive mechanism to generate reliable representations. Additionally, we provide a theoretical proof demonstrating that these representations can discard noisy information, thereby improving the performance of downstream tasks. Extensive experiments on six benchmark datasets demonstrate that AIRMVC outperforms state-of-the-art algorithms in terms of robustness in noisy scenarios. The code of AIRMVC are available at https://github.com/xihongyang1999/AIRMVC on Github.

Lay Summary:

Modern AI systems often group data by leveraging multiple views or perspectives—such as different sensor readings or feature sets—to discover underlying patterns, a process known as multi-view clustering. In practice, however, one or more of these views can be corrupted by noise—faulty sensors, labeling errors, or environmental interference—which can dramatically degrade clustering quality.To address this challenge, we introduce AIRMVC, a novel framework that first treats noise detection as an anomaly-identification problem using Gaussian Mixture Models. Once noisy samples are flagged, a hybrid rectification strategy repairs the corrupted views by selectively correcting their data. We further enhance robustness with a noise-resistant contrastive learning mechanism.We also provide a theoretical guarantee: the representations generated by AIRMVC can discard noisy information, thereby improving the performance of downstream tasks. Extensive experiments on six benchmark datasets demonstrate that AIRMVC consistently outperforms state-of-the-art multi-view clustering algorithms, offering a more resilient solution for real-world noisy environments.

Chat is not available.