Poster
Generalization Performance of Ensemble Clustering: From Theory to Algorithm
Xu Zhang · Haoye Qiu · Weixuan Liang · Hui LIU · Junhui Hou · Yuheng Jia
East Exhibition Hall A-B #E-1901
Ensemble clustering is a widely used technique that combines multiple clustering results to achieve higher robustness and accuracy. It has found applications in fields such as image analysis, customer segmentation, and bioinformatics. Despite its empirical success, its theoretical underpinnings remain largely unexplored. This work provides a rigorous analysis of the generalization performance of ensemble clustering. We derive bounds for the generalization error and excess risk, and characterize the asymptotic consistency of ensemble clustering. Our results demonstrate that increasing the number of samples alone is insufficient to guarantee performance gains; rather, the number and diversity of base clusterings are critical factors. Building upon this theoretical framework, we propose a novel weighted ensemble clustering algorithm that jointly minimizes bias and maximizes diversity across the base clusterings. Extensive experiments on real-world datasets confirm that our method consistently outperforms state-of-the-art techniques, with average improvements exceeding 6\%. This study not only advances the theoretical understanding of ensemble clustering but also offers practical insights into the design of more effective and principled clustering algorithms.