An evolutionary clustering algorithm for gene expression microarray data analysis

被引:71
|
作者
Ma, Patrick C. H. [1 ]
Chan, Keith C. C.
Yao, Xin
Chiu, David K. Y.
机构
[1] Hong Kong Polytech Univ, Dept Comp, Kowloon, Hong Kong, Peoples R China
[2] Univ Birmingham, Sch Comp Sci, CERCIA, Birmingham B15 2TT, W Midlands, England
[3] Univ Guelph, Biophys Interdept Grp, Guelph, ON N1G 2W1, Canada
[4] Univ Guelph, Dept Comp & Informat Sci, Guelph, ON N1G 2W1, Canada
关键词
bioinformatics; clustering; DNA sequence analysis; evolutionary algorithms (EAs); gene expression microarray data analysis;
D O I
10.1109/TEVC.2005.859371
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering is concerned with the discovery of interesting groupings of records in a database. Many algorithms have been developed to tackle clustering problems in a variety of application domains. In particular, some of them have been used in bioinformatics research to uncover inherent clusters in gene expression microarray data. In this paper, we show how some popular clustering algorithms have been used for this purpose. Based on experiments using simulated and real data, we also show that the performance of these algorithms can be further improved. For more effective clustering of gene expression microarray data, which is typically characterized by a lot of noise, we propose a novel evolutionary algorithm called evolutionary clustering (EvoCluster). EvoCluster encodes an entire cluster grouping in a chromosome so that each gene in the chromosome encodes one cluster. Based on such encoding scheme, it makes use of a set of reproduction operators to facilitate the exchange of grouping information between chromosomes. The fitness function that the EvoCluster adopts is able to differentiate between how relevant a feature value is in determining a particular cluster grouping. As such, instead of just local pairwise distances, it also takes into consideration how clusters are arranged globally. Unlike many popular clustering algorithms, EvoCluster does not require the number of clusters to be decided in advance. Also, patterns hidden in each cluster can be explicitly revealed and presented for easy interpretation even by casual users. For performance evaluation, we have tested EvoCluster using both simulated and real data. Experimental results show that it can be very effective and robust even in the presence of noise and missing values. Also, when correlating the gene expression microarray data with DNA sequences, we were able to uncover significant biological binding sites (both previously known and unknown) in each cluster discovered by EvoCluster.
引用
收藏
页码:296 / 314
页数:19
相关论文
共 50 条
  • [31] The Clustering Algorithm Study of Gene Expression Data
    He Rui
    Lin Chunmei
    ENVIRONMENTAL BIOTECHNOLOGY AND MATERIALS ENGINEERING, PTS 1-3, 2011, 183-185 : 93 - +
  • [32] Constrained Competitive Learning Algorithm for DNA Microarray Gene Expression Data Analysis
    Wu, Shuanhu
    Yan, Hong
    Zeng, Qingshang
    Zhang, Yanjie
    Song, Yibin
    INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTATION TECHNOLOGY AND AUTOMATION, VOL 1, PROCEEDINGS, 2008, : 44 - +
  • [33] Algorithm for Clustering Analysis of Gene Expression Data using MapReduce Framework
    Priya, P. Packia Amutha
    Lawrance, R.
    2016 INTERNATIONAL CONFERENCE ON COMPUTING TECHNOLOGIES AND INTELLIGENT DATA ENGINEERING (ICCTIDE'16), 2016,
  • [34] Incorporating gene ontology into fuzzy relational clustering of microarray gene expression data
    Paul, Animesh Kumar
    Shill, Pintu Chandra
    BIOSYSTEMS, 2018, 163 : 1 - 10
  • [35] A novel clustering algorithm by clubbing GHFCM and GWO for microarray gene data
    P. Edwin Dhas
    B. Sankara Gomathi
    The Journal of Supercomputing, 2020, 76 : 5679 - 5693
  • [36] A novel clustering algorithm by clubbing GHFCM and GWO for microarray gene data
    Edwin Dhas, P.
    Sankara Gomathi, B.
    JOURNAL OF SUPERCOMPUTING, 2020, 76 (08): : 5679 - 5693
  • [37] K-Boost: A Scalable Algorithm for High-Quality Clustering of Microarray Gene Expression Data
    Geraci, Filippo
    Leoncini, Mauro
    Montangero, Manuela
    Pellegrini, Marco
    Renda, M. Elena
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2009, 16 (06) : 859 - 873
  • [38] Advanced Soft-Computing techniques and Clustering Algorithm for Gene Expression Microarray Data Classification.
    Valenzuela, Olga
    Rojas, Fernando
    Ortuno, Francisco
    Luis Bernier, Jose
    Jose Saez, M.
    San-Roman, Belen
    Javier Herrera, Luis
    Guillen, Alberto
    Rojas, Ignacio
    PROCEEDINGS IWBBIO 2014: INTERNATIONAL WORK-CONFERENCE ON BIOINFORMATICS AND BIOMEDICAL ENGINEERING, VOLS 1 AND 2, 2014, : 1634 - 1643
  • [39] Analysis of gene expression data: Application of quantum-inspired evolutionary algorithm to minimum sum-of-squares clustering
    Zhou, WG
    Zhou, CG
    Huang, YX
    Wang, Y
    ROUGH SETS, FUZZY SETS, DATA MINING, AND GRANULAR COMPUTING, PT 2, PROCEEDINGS, 2005, 3642 : 383 - 391
  • [40] Novel clustering algorithm for microarray expression data in a truncated SVD space
    Horn, D
    Axel, I
    BIOINFORMATICS, 2003, 19 (09) : 1110 - 1115