LCE: a link-based cluster ensemble method for improved gene expression data analysis

被引:91
|
作者
Iam-on, Natthakan [1 ]
Boongoen, Tossapon [1 ,2 ]
Garrett, Simon [1 ]
机构
[1] Aberystwyth Univ, Dept Comp Sci, Aberystwyth, Ceredigion, Wales
[2] Royal Thai AF Acad, Dept Math & Comp Sci, Bangkok, Thailand
关键词
CLASS DISCOVERY; MOLECULAR CLASSIFICATION; PREDICTION; CANCER; SUBTYPES;
D O I
10.1093/bioinformatics/btq226
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: It is far from trivial to select the most effective clustering method and its parameterization, for a particular set of gene expression data, because there are a very large number of possibilities. Although many researchers still prefer to use hierarchical clustering in one form or another, this is often sub-optimal. Cluster ensemble research solves this problem by automatically combining multiple data partitions from different clusterings to improve both the robustness and quality of the clustering result. However, many existing ensemble techniques use an association matrix to summarize sample-cluster co-occurrence statistics, and relations within an ensemble are encapsulated only at coarse level, while those existing among clusters are completely neglected. Discovering these missing associations may greatly extend the capability of the ensemble methodology for microarray data clustering. Results: The link-based cluster ensemble (LCE) method, presented here, implements these ideas and demonstrates outstanding performance. Experiment results on real gene expression and synthetic datasets indicate that LCE: (i) usually outperforms the existing cluster ensemble algorithms in individual tests and, overall, is clearly class-leading; (ii) generates excellent, robust performance across different types of data, especially with the presence of noise and imbalanced data clusters; (iii) provides a high-level data matrix that is applicable to many numerical clustering techniques; and (iv) is computationally efficient for large datasets and gene clustering.
引用
收藏
页码:1513 / 1519
页数:7
相关论文
共 50 条
  • [1] Improved Link-Based Cluster Ensembles for Microarray Data Analysis
    Iam-On, Natthakan
    Boongoen, Tossapon
    [J]. PROCEEDINGS 2012 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2012, : 2014 - 2019
  • [2] A Link-Based Cluster Ensemble Approach for Categorical Data Clustering
    Iam-On, Natthakan
    Boongoen, Tossapon
    Garrett, Simon
    Price, Chris
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2012, 24 (03) : 413 - 425
  • [3] A Link-Based Approach to the Cluster Ensemble Problem
    Iam-On, Natthakan
    Boongoen, Tossapon
    Garrett, Simon
    Price, Chris
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2011, 33 (12) : 2396 - 2409
  • [4] Improved Link-Based Cluster Ensembles
    Iam-On, Natthakan
    Boongoen, Tossapon
    [J]. 2012 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2012,
  • [5] Cluster Ensemble with Link-Based Approach for Botnet Detection
    Long Mai
    Dong Kun Noh
    [J]. Journal of Network and Systems Management, 2018, 26 : 616 - 639
  • [6] Cluster Ensemble with Link-Based Approach for Botnet Detection
    Mai, Long
    Noh, Dong Kun
    [J]. JOURNAL OF NETWORK AND SYSTEMS MANAGEMENT, 2018, 26 (03) : 616 - 639
  • [7] Link-Based Cluster Ensembles for Heterogeneous Biological Data Analysis
    Iam-On, Natthakan
    Garrett, Simon
    Price, Chris
    Boongoen, Tossapon
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2010, : 573 - 578
  • [8] Diversity-driven generation of link-based cluster ensemble and application to data classification
    Iam-On, Natthakan
    Boongoen, Tossapon
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (21) : 8259 - 8273
  • [9] High-performance link-based cluster ensemble approach for categorical data clustering
    Yuvaraj, N.
    Dhas, C. Suresh Ghana
    [J]. JOURNAL OF SUPERCOMPUTING, 2020, 76 (06): : 4556 - 4579
  • [10] High-performance link-based cluster ensemble approach for categorical data clustering
    N. Yuvaraj
    C. Suresh Ghana Dhas
    [J]. The Journal of Supercomputing, 2020, 76 : 4556 - 4579