Semi-Supervised Clustering Ensemble Based on Cluster Consensus Selection

被引:0
|
作者
Liu, Yanxi [1 ,3 ]
Al-Khafaji, Ali Hussein Demin [2 ]
机构
[1] Anshan Normal Univ, Informat Ctr, Anshan, Liaoning, Peoples R China
[2] Al Mustaqbal Univ Coll, Dept Labs, Tech, Babylon, Hillah, Iraq
[3] Anshan Normal Univ, Informat Ctr, Anshan 114007, Liaoning, Peoples R China
关键词
Consensus selection; ensemble clustering; NMI; semi-supervised clustering;
D O I
10.1080/01969722.2022.2159150
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Ensemble clustering emerged as an important extension of classical clustering problems and is one of the most recent advances in unsupervised learning. Its purpose is to combine the results obtained using different algorithms by a consensus function so that the final solution is more favorable than the individual clustering algorithms. In this study, we propose a semi-supervised clustering ensemble framework using cluster consensus selection, which tries to improve the accuracy of clustering results. In general, there are two types of semi-supervised clustering algorithms, including constraint-based and metric-based. Here, the proposed ensemble clustering algorithm is equipped with a semi-supervised clustering mechanism based on pairwise constraints. Since the complexity of consensus functions scales with the number of clustering methods, processing big data for ensemble clustering is sometimes slow or impossible. Usually, all primary clusters from all clustering methods are used in the consensus function. However, the merit of clusters from different methods can be considered to improve the consensus quality. Accordingly, we propose a cluster consensus selection approach that selects a subset of meriting primary clusters to participate in the final consensus. Here, Normalized Mutual Information (NMI) is developed to measure the merit of clusters. Meanwhile, reducing the number of primary clusters in the consensus function can enable big data clustering. The proposed algorithm is very computationally efficient and provides linear complexity in clustering. Experimental results show the effectiveness of the proposed algorithm in terms of different performance metrics such as NMI, ARI and CPCC.
引用
收藏
页数:29
相关论文
共 50 条
  • [21] Active Query Selection for Semi-supervised Clustering
    Mallapragada, Pavan Kumar
    Jin, Rong
    Jain, Anil K.
    [J]. 19TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1-6, 2008, : 2376 - 2379
  • [22] Constraint Selection for Semi-supervised Topological Clustering
    Allab, Kais
    Benabdeslem, Khalid
    [J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PT I, 2011, 6911 : 28 - 43
  • [23] Semi-supervised eigenvector selection for spectral clustering
    Zhao, Feng
    Jiao, Li-Cheng
    Liu, Han-Qiang
    Gong, Mao-Guo
    [J]. Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2011, 24 (01): : 48 - 56
  • [24] Semi-Supervised Consensus Clustering: Reducing Human Effort
    Vogel, Tobias
    Naumann, Felix
    [J]. 2014 IEEE International Conference on Data Mining Workshop (ICDMW), 2014, : 1095 - 1104
  • [25] Semi-Supervised Consensus Clustering for ECG Pathology Classification
    Aidos, Helena
    Lourenco, Andre
    Batista, Diana
    Bulo, Samuel Rota
    Fred, Ana
    [J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PT III, 2015, 9286 : 150 - 164
  • [26] Constraint selection by committee: an ensemble approach to identifying informative constraints for semi-supervised clustering
    Greene, Derek
    Cunningham, Padraig
    [J]. MACHINE LEARNING: ECML 2007, PROCEEDINGS, 2007, 4701 : 140 - +
  • [27] Incremental Semi-Supervised Clustering Ensemble for High Dimensional Data Clustering
    Yu, Zhiwen
    Luo, Peinan
    You, Jane
    Wong, Hau-San
    Leung, Hareton
    Wu, Si
    Zhang, Jun
    Han, Guoqiang
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (03) : 701 - 714
  • [28] Incremental Semi-supervised Clustering Ensemble for High Dimensional Data Clustering
    Yu, Zhiwen
    Luo, Peinan
    Wu, Si
    Han, Guoqiang
    You, Jane
    Leung, Hareton
    Wong, Hau-San
    Zhang, Jun
    [J]. 2016 32ND IEEE INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2016, : 1484 - 1485
  • [29] Active constraints selection based semi-supervised dimensionality in ensemble subspaces
    Zeng, Jie
    Nie, Wei
    Zhang, Yong
    [J]. JOURNAL OF SYSTEMS ENGINEERING AND ELECTRONICS, 2015, 26 (05) : 1088 - 1099
  • [30] Semi-supervised Constrained Clustering with Cluster Outlier Filtering
    Bravo, Cristian
    Weber, Richard
    [J]. PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, 2011, 7042 : 347 - 354