A clustering ensemble algorithm for handling deep embeddings using cluster confidence

被引:0
|
作者
Zeng, Lingbin [1 ]
Yao, Shixin [1 ]
Liu, Xinwang [1 ]
Xiao, Liquan [1 ]
Qian, Yue [1 ]
机构
[1] Natl Univ Def Technol, 109 Deya Rd, Changsha, Hunan, Peoples R China
来源
COMPUTER JOURNAL | 2024年 / 68卷 / 02期
基金
国家重点研发计划;
关键词
D O I
10.1093/comjnl/bxae101
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Clustering ensemble, which aims to learn a robust consensus clustering from multiple weak base clusterings, has achieved promising performance on various applications. With the development of big data, the scale and complexity of data is constantly increasing. However, most existing clustering ensemble methods typically employ shallow clustering algorithms to generate base clusterings. When confronted with high-dimensional complex data, these shallow algorithms fail to fully utilize the intricate features present in the latent data space. As a result, the quality and diversity of the generated base clusterings are insufficient, thus affecting the subsequent ensemble performance. To address this issue, we propose a novel clustering ensemble algorithm for handling deep embeddings using cluster confidence (CEDECC) to improve the robustness and performance. Instead of simply combining deep clustering with clustering ensembles, we take into consideration that the performance of existing deep clustering methods heavily relies on the quality of low-dimensional embeddings generated during the pre-training stage. The quality of embeddings is unstable due to the influence of different initialization parameters. In CEDECC, specifically, we first construct a cluster confidence measure to evaluate the quality of low-dimensional embeddings. Typically, high-quality low-dimensional embeddings yield accurate clustering results with the same model parameters. Then, we utilize multiple high-quality embeddings to generate the base partitions. In the ensemble strategy phase, we consider the cluster-wise diversity and propose a novel ensemble cluster estimation to improve the overall consensus performance of the model. Extensive experiments on three benchmark datasets and four real-world biological datasets have demonstrated that the proposed CEDECC consistently outperforms the state-of-the-art clustering ensemble methods.
引用
收藏
页码:163 / 174
页数:12
相关论文
共 50 条
  • [21] Ensemble imbalance classification: Using data preprocessing, clustering algorithm and genetic algorithm
    Abolkarlou, Niloofar Afshari
    Niknafs, Ali Akbar
    Ebrahimpour, Mohammad Kazem
    2014 4TH INTERNATIONAL CONFERENCE ON COMPUTER AND KNOWLEDGE ENGINEERING (ICCKE), 2014, : 171 - 176
  • [22] Weighted Delta Factor Cluster Ensemble Algorithm for Categorical Data Clustering in Data Mining
    Sengottaian, Sarumathi
    Natesan, Shanthi
    Mathivanan, Sharmila
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2017, 14 (03) : 275 - 284
  • [23] DDoS Detection Using CURE Clustering Algorithm with Outlier Removal Clustering for Handling Outliers
    Laksono, Muhammad Agung Tri
    Purwanto, Yudha
    Novianty, Astri
    2015 INTERNATIONAL CONFERENCE ON CONTROL, ELECTRONICS, RENEWABLE ENERGY AND COMMUNICATIONS (ICCEREC), 2015, : 12 - 18
  • [24] From Ensemble Clustering to Subspace Clustering: Cluster Structure Encoding
    Tao, Zhiqiang
    Li, Jun
    Fu, Huazhu
    Kong, Yu
    Fu, Yun
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (05) : 2670 - 2681
  • [25] A Clustering Ensemble Method Based on Cluster Selection and Cluster Splitting
    Tang, Yuyang
    Liu, Xiabi
    PROCEEDINGS OF 2018 10TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND COMPUTING (ICMLC 2018), 2018, : 54 - 58
  • [26] Fuzzy clustering ensemble considering cluster dependability
    School of Information Engineering, China University of Geosciences , Beijing, China
    不详
    不详
    不详
    不详
    不详
    不详
    Int. J. on Artif. Intell. Tools, 2021, 2
  • [27] Fair Clustering Ensemble With Equal Cluster Capacity
    Zhou, Peng
    Li, Rongwen
    Ling, Zhaolong
    Du, Liang
    Liu, Xinwang
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2025, 47 (03) : 1729 - 1746
  • [28] Clustering Categorical Data:A Cluster Ensemble Approach
    何增友
    High Technology Letters, 2003, (04) : 8 - 12
  • [29] Fuzzy Clustering Ensemble Considering Cluster Dependability
    Chen, Zhong
    Bagherinia, Ali
    Minaei-Bidgoli, Behrooz
    Parvin, Hamid
    Pho, Kim-Hung
    INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2021, 30 (02)
  • [30] A decentralized algorithm for distributed ensemble clustering
    Rosato, Antonello
    Altilio, Rosa
    Panella, Massimo
    INFORMATION SCIENCES, 2021, 578 : 417 - 434