A clustering ensemble algorithm for handling deep embeddings using cluster confidence

被引:0
|
作者
Zeng, Lingbin [1 ]
Yao, Shixin [1 ]
Liu, Xinwang [1 ]
Xiao, Liquan [1 ]
Qian, Yue [1 ]
机构
[1] Natl Univ Def Technol, 109 Deya Rd, Changsha, Hunan, Peoples R China
来源
COMPUTER JOURNAL | 2024年 / 68卷 / 02期
基金
国家重点研发计划;
关键词
D O I
10.1093/comjnl/bxae101
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Clustering ensemble, which aims to learn a robust consensus clustering from multiple weak base clusterings, has achieved promising performance on various applications. With the development of big data, the scale and complexity of data is constantly increasing. However, most existing clustering ensemble methods typically employ shallow clustering algorithms to generate base clusterings. When confronted with high-dimensional complex data, these shallow algorithms fail to fully utilize the intricate features present in the latent data space. As a result, the quality and diversity of the generated base clusterings are insufficient, thus affecting the subsequent ensemble performance. To address this issue, we propose a novel clustering ensemble algorithm for handling deep embeddings using cluster confidence (CEDECC) to improve the robustness and performance. Instead of simply combining deep clustering with clustering ensembles, we take into consideration that the performance of existing deep clustering methods heavily relies on the quality of low-dimensional embeddings generated during the pre-training stage. The quality of embeddings is unstable due to the influence of different initialization parameters. In CEDECC, specifically, we first construct a cluster confidence measure to evaluate the quality of low-dimensional embeddings. Typically, high-quality low-dimensional embeddings yield accurate clustering results with the same model parameters. Then, we utilize multiple high-quality embeddings to generate the base partitions. In the ensemble strategy phase, we consider the cluster-wise diversity and propose a novel ensemble cluster estimation to improve the overall consensus performance of the model. Extensive experiments on three benchmark datasets and four real-world biological datasets have demonstrated that the proposed CEDECC consistently outperforms the state-of-the-art clustering ensemble methods.
引用
收藏
页码:163 / 174
页数:12
相关论文
共 50 条
  • [1] Cluster ensemble algorithm using the Binary k-means and spectral clustering
    Yang, P. (yangpeng19880918@163.com), 1600, Binary Information Press (10):
  • [2] A Novel Cluster Ensemble based on a Single Clustering Algorithm
    Khan, Tahseen
    Tian, Wenhong
    Kadhim, Mustafa R.
    Buyya, Rajkumar
    PROCEEDINGS OF THE 2021 16TH CONFERENCE ON COMPUTER SCIENCE AND INTELLIGENCE SYSTEMS (FEDCSIS), 2021, : 127 - 135
  • [3] A clustering ensemble algorithm based on cluster-mode
    Jia, Rui-Yu
    Geng, Jin-Wei
    International Journal of Digital Content Technology and its Applications, 2012, 6 (19) : 17 - 24
  • [4] Clustering Ensemble Algorithm with Cluster Connection Based on Wisdom of Crowds
    Zhang H.
    Gao Y.
    Chen Y.
    Wang Z.
    Gao, Yukun (821566504@qq.com), 2018, Science Press (55): : 2611 - 2619
  • [5] Ensemble deep learning of embeddings for clustering multimodal single-cell omics data
    Yu, Lijia
    Liu, Chunlei
    Yang, Jean Yee Hwa
    Yang, Pengyi
    BIOINFORMATICS, 2023, 39 (06)
  • [6] Document cluster ensemble method using the spectral clustering
    1767, ICIC Express Letters Office (08):
  • [7] An ensemble hierarchical clustering algorithm based on merits at cluster and partition levels
    Huang, Qirui
    Gao, Rui
    Akhavan, Hoda
    PATTERN RECOGNITION, 2023, 136
  • [8] A cluster-weighted clustering ensemble algorithm based on member selection
    Xu, Sen
    Gao, Ting
    Xu, Xiu-Fang
    Xu, He-Yang
    Guo, Nai-Xuan
    Bian, Xue-Sheng
    Hua, Xiaopeng
    Chen, Zhi-Yuan
    Kongzhi yu Juece/Control and Decision, 2024, 39 (12): : 4136 - 4140
  • [9] Cluster ensemble algorithm using affinity propagation
    Wang, Xianhui
    Qin, Zheng
    Zhang, Xuanping
    Gao, Hongjiang
    Hsi-An Chiao Tung Ta Hsueh/Journal of Xi'an Jiaotong University, 2011, 45 (08): : 1 - 6
  • [10] MULTITOPIC TEXT CLUSTERING AND CLUSTER LABELING USING CONTEXTUALIZED WORD EMBEDDINGS
    Ostapiuk, Z., V
    Korotyeyeva, T. O.
    RADIO ELECTRONICS COMPUTER SCIENCE CONTROL, 2020, (04) : 95 - 105