A comparison and evaluation of five biclustering algorithms by quantifying goodness of biclusters for gene expression data

被引:16
|
作者
Li, Li [1 ,2 ,4 ,5 ]
Guo, Yang [3 ,5 ]
Wu, Wenwu [1 ,2 ,3 ,5 ]
Shi, Youyi [1 ,2 ,4 ,5 ]
Cheng, Jian [1 ,2 ,3 ,5 ]
Tao, Shiheng [1 ,2 ,3 ,5 ]
机构
[1] Northwest A&F Univ, State Key Lab Crop Stress Biol Arid Areas, Yangling 712100, Shaanxi, Peoples R China
[2] Northwest A&F Univ, Coll Sci, Yangling 712100, Shaanxi, Peoples R China
[3] Northwest A&F Univ, Coll Life Sci, Yangling 712100, Shaanxi, Peoples R China
[4] Northwest A&F Univ, Inst Appl Math, Yangling 712100, Shaanxi, Peoples R China
[5] Northwest A&F Univ, Bioinformat Ctr, Yangling 712100, Shaanxi, Peoples R China
来源
BIODATA MINING | 2012年 / 5卷
关键词
MICROARRAY DATA-ANALYSIS; PROTEINS;
D O I
10.1186/1756-0381-5-8
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: Several biclustering algorithms have been proposed to identify biclusters, in which genes share similar expression patterns across a number of conditions. However, different algorithms would yield different biclusters and further lead to distinct conclusions. Therefore, some testing and comparisons between these algorithms are strongly required. Methods: In this study, five biclustering algorithms (i.e. BIMAX, FABIA, ISA, QUBIC and SAMBA) were compared with each other in the cases where they were used to handle two expression datasets (GDS1620 and pathway) with different dimensions in Arabidopsis thaliana (A. thaliana) GO (gene ontology) annotation and PPI (protein-protein interaction) network were used to verify the corresponding biological significance of biclusters from the five algorithms. To compare the algorithms' performance and evaluate quality of identified biclusters, two scoring methods, namely weighted enrichment (WE) scoring and PPI scoring, were proposed in our study. For each dataset, after combining the scores of all biclusters into one unified ranking, we could evaluate the performance and behavior of the five biclustering algorithms in a better way. Results: Both WE and PPI scoring methods has been proved effective to validate biological significance of the biclusters, and a significantly positive correlation between the two sets of scores has been tested to demonstrate the consistence of these two methods. A comparative study of the above five algorithms has revealed that: (1) ISA is the most effective one among the five algorithms on the dataset of GDS1620 and BIMAX outperforms the other algorithms on the dataset of pathway. (2) Both ISA and BIMAX are data-dependent. The former one does not work well on the datasets with few genes, while the latter one holds well for the datasets with more conditions. (3) FABIA and QUBIC perform poorly in this study and they may be suitable to large datasets with more genes and more conditions. (4) SAMBA is also data-independent as it performs well on two given datasets. The comparison results provide useful information for researchers to choose a suitable algorithm for each given dataset.
引用
收藏
页数:10
相关论文
共 50 条
  • [41] Evaluation and comparison of clustering algorithms in analyzing es cell gene expression data
    Chen, GX
    Jaradat, SA
    Banerjee, N
    Tanaka, TS
    Ko, MSH
    Zhang, MQ
    STATISTICA SINICA, 2002, 12 (01) : 241 - 262
  • [42] Evolutionary fuzzy biclustering of gene expression data
    Mitra, Sushmita
    Banka, Haider
    Paik, Jiaul Hoque
    ROUGH SETS AND KNOWLEDGE TECHNOLOGY, PROCEEDINGS, 2007, 4481 : 284 - +
  • [43] Biclustering of gene expression data using biclustering iterative signature algorithm and biclustering coherent column
    Kumar, E. Saravana
    Vengatesan, K.
    Singh, R. P.
    Rajan, C.
    INTERNATIONAL JOURNAL OF BIOMEDICAL ENGINEERING AND TECHNOLOGY, 2018, 26 (3-4) : 341 - 352
  • [44] Mining Functional Biclusters of DNA Microarray Gene Expression Data
    Zhao, Hongya
    Huang, Qing-Hua
    Chan, Kwok Leung
    Cheng, Lee-Ming
    Yan, Hong
    2008 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), VOLS 1-6, 2008, : 1736 - 1741
  • [45] Evaluation of clustering algorithms for gene expression data
    Datta, Susmita
    Datta, Somnath
    BMC BIOINFORMATICS, 2006, 7 (Suppl 4)
  • [46] Evaluation of clustering algorithms for gene expression data
    Susmita Datta
    Somnath Datta
    BMC Bioinformatics, 7
  • [47] Parallelized Evolutionary Learning for Detection of Biclusters in Gene Expression Data
    Huang, Qinghua
    Tao, Dacheng
    Li, Xuelong
    Liew, Alan Wee-Chung
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2012, 9 (02) : 560 - 570
  • [48] Extending Probabilistic Encoding for Discovering Biclusters in Gene Expression Data
    Javier Gil-Cumbreras, Francisco
    Giraldez, Raul
    Aguilar-Ruiz, Jesus S.
    HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, 2016, 9648 : 706 - 717
  • [49] Finding k-Biclusters from Gene Expression Data
    Xu, Xiaohua
    He, Ping
    Lu, Lin
    Xi, Yanqiu
    Pan, Zhoujin
    INTELLIGENT COMPUTING THEORIES AND APPLICATIONS, ICIC 2012, 2012, 7390 : 433 - 439
  • [50] Ensemble Cuckoo Search Biclustering of the gene expression data
    Yin, Lu
    Liu, Yongguo
    2016 IEEE 15TH INTERNATIONAL CONFERENCE ON COGNITIVE INFORMATICS & COGNITIVE COMPUTING (ICCI*CC), 2016, : 419 - 422