Large-scale Robust Enhanced Ensemble Clustering via Outlier Decoupling
Abstract
Ensemble clustering aims to derive a consensus partition from multiple base clustering results. Anchor-based methods construct compact similarity representations via anchors, substantially improving computational efficiency. However, when outliers contaminate the data, reconstructing the base clustering results often yields biased anchors. These biased anchors degrade the quality of the anchor similarity matrix and lead to a decline in clustering accuracy. To address this issue, we propose a novel method called large-scale robust enhanced ensemble clustering via outlier decoupling (RANGE). Specifically, RANGE first converts the base clustering results into an initial bipartite graph. To enhance the reliability of this bipartite graph, RANGE designs a high-order fuzzy enhancement strategy (HFES) specifically for initial bipartite graphs. Next, a mapping matrix further filters redundant information from the enhanced bipartite graph. RANGE then reconstructs the mapped bipartite graph via matrix factorization. An anchor matrix is introduced to further enhance computational efficiency. To improve robustness, RANGE incorporates a decoupling term that separates the clean clustering structure and the outlier-contaminated structure in the anchor space. With this decoupling mechanism, RANGE is capable of performing robust ensemble clustering. Moreover, by applying outlier detectors to the decoupled outlier structure, RANGE can be extended to the outlier-detection task. Consequently, RANGE forms a cross-task general framework, and both tasks retain linear time complexity. Extensive cross-domain experiments indicate that RANGE delivers superior performance in both clustering validity and outlier detection. The code is available in the supplementary material.