Incremental Semi-Supervised Clustering Ensemble for High Dimensional Data Clustering

被引:137
|
作者
Yu, Zhiwen [1 ]
Luo, Peinan [1 ]
You, Jane [2 ]
Wong, Hau-San [3 ]
Leung, Hareton [2 ]
Wu, Si [1 ]
Zhang, Jun [4 ]
Han, Guoqiang [1 ]
机构
[1] S China Univ Technol, Sch Comp Sci & Engn, Guangzhou 510641, Guangdong, Peoples R China
[2] Hong Kong Polytech Univ, Dept Comp, Kowloon, Hong Kong, Peoples R China
[3] City Univ Hong Kong, Dept Comp Sci, Kowloon, Hong Kong, Peoples R China
[4] Sun Yat Sen Univ, Sch Adv Comp, Guangzhou 510275, Guangdong, Peoples R China
关键词
Cluster ensemble; semi-supervised clustering; random subspace; cancer gene expression profile; clustering analysis; CLASS DISCOVERY; CONSENSUS; FRAMEWORK;
D O I
10.1109/TKDE.2015.2499200
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Traditional cluster ensemble approaches have three limitations: (1) They do not make use of prior knowledge of the datasets given by experts. (2) Most of the conventional cluster ensemble methods cannot obtain satisfactory results when handling high dimensional data. (3) All the ensemble members are considered, even the ones without positive contributions. In order to address the limitations of conventional cluster ensemble approaches, we first propose an incremental semi-supervised clustering ensemble framework (ISSCE) which makes use of the advantage of the random subspace technique, the constraint propagation approach, the proposed incremental ensemble member selection process, and the normalized cut algorithm to perform high dimensional data clustering. The random subspace technique is effective for handling high dimensional data, while the constraint propagation approach is useful for incorporating prior knowledge. The incremental ensemble member selection process is newly designed to judiciously remove redundant ensemble members based on a newly proposed local cost function and a global cost function, and the normalized cut algorithm is adopted to serve as the consensus function for providing more stable, robust, and accurate results. Then, a measure is proposed to quantify the similarity between two sets of attributes, and is used for computing the local cost function in ISSCE. Next, we analyze the time complexity of ISSCE theoretically. Finally, a set of nonparametric tests are adopted to compare multiple semi-supervised clustering ensemble approaches over different datasets. The experiments on 18 real-world datasets, which include six UCI datasets and 12 cancer gene expression profiles, confirm that ISSCE works well on datasets with very high dimensionality, and outperforms the state-of-the-art semi-supervised clustering ensemble approaches.
引用
收藏
页码:701 / 714
页数:14
相关论文
共 50 条
  • [1] Incremental Semi-supervised Clustering Ensemble for High Dimensional Data Clustering
    Yu, Zhiwen
    Luo, Peinan
    Wu, Si
    Han, Guoqiang
    You, Jane
    Leung, Hareton
    Wong, Hau-San
    Zhang, Jun
    2016 32ND IEEE INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2016, : 1484 - 1485
  • [2] Semi-supervised spectral clustering ensemble
    1600, ICIC Express Letters Office (10):
  • [3] Incremental semi-supervised clustering in a data stream with a flock of agents
    Bruneau, Pierrick
    Picarougne, Fabien
    Gelgon, Marc
    2009 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION, VOLS 1-5, 2009, : 3067 - 3074
  • [4] Classification of Data Streams by Incremental Semi-supervised Fuzzy Clustering
    Castellano, G.
    Fanelli, A. M.
    FUZZY LOGIC AND SOFT COMPUTING APPLICATIONS, WILF 2016, 2017, 10147 : 185 - 194
  • [5] Subspace metric ensembles for semi-supervised clustering of high dimensional data
    Yan, Bojun
    Domeniconi, Carlotta
    MACHINE LEARNING: ECML 2006, PROCEEDINGS, 2006, 4212 : 509 - 520
  • [6] Convergence Analysis of Semi-supervised Clustering Ensemble
    Chen, Dahai
    Yang, Yan
    Wang, Hongjun
    Mahmood, Amjad
    2013 INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND TECHNOLOGY (ICIST), 2013, : 783 - 788
  • [7] Adaptive Regularized Semi-Supervised Clustering Ensemble
    Luo, Rui
    Yu, Zhiwen
    Cao, Wenming
    Liu, Cheng
    Wong, Hau-San
    Chen, C. L. Philip
    IEEE ACCESS, 2020, 8 : 17926 - 17934
  • [8] Incremental Clustering for Semi-Supervised Anomaly Detection applied on Log Data
    Wurzenberger, Markus
    Skopik, Florian
    Landauer, Max
    Greitbauer, Philipp
    Fiedler, Roman
    Kastner, Wolfgang
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON AVAILABILITY, RELIABILITY AND SECURITY (ARES 2017), 2017,
  • [9] Incremental adaptive semi-supervised fuzzy clustering for data stream classification
    Casalino, Gabriella
    Castellano, Giovanna
    Mencar, Corrado
    PROCEEDINGS OF THE 2018 IEEE INTERNATIONAL CONFERENCE ON EVOLVING AND ADAPTIVE INTELLIGENT SYSTEMS (EAIS), 2018,
  • [10] Data Stream Classification by Dynamic Incremental Semi-Supervised Fuzzy Clustering
    Casalino, Gabriella
    Castellano, Giovanna
    Mencar, Corrado
    INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2019, 28 (08)