A comparison and evaluation of five biclustering algorithms by quantifying goodness of biclusters for gene expression data

被引:16
|
作者
Li, Li [1 ,2 ,4 ,5 ]
Guo, Yang [3 ,5 ]
Wu, Wenwu [1 ,2 ,3 ,5 ]
Shi, Youyi [1 ,2 ,4 ,5 ]
Cheng, Jian [1 ,2 ,3 ,5 ]
Tao, Shiheng [1 ,2 ,3 ,5 ]
机构
[1] Northwest A&F Univ, State Key Lab Crop Stress Biol Arid Areas, Yangling 712100, Shaanxi, Peoples R China
[2] Northwest A&F Univ, Coll Sci, Yangling 712100, Shaanxi, Peoples R China
[3] Northwest A&F Univ, Coll Life Sci, Yangling 712100, Shaanxi, Peoples R China
[4] Northwest A&F Univ, Inst Appl Math, Yangling 712100, Shaanxi, Peoples R China
[5] Northwest A&F Univ, Bioinformat Ctr, Yangling 712100, Shaanxi, Peoples R China
来源
BIODATA MINING | 2012年 / 5卷
关键词
MICROARRAY DATA-ANALYSIS; PROTEINS;
D O I
10.1186/1756-0381-5-8
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: Several biclustering algorithms have been proposed to identify biclusters, in which genes share similar expression patterns across a number of conditions. However, different algorithms would yield different biclusters and further lead to distinct conclusions. Therefore, some testing and comparisons between these algorithms are strongly required. Methods: In this study, five biclustering algorithms (i.e. BIMAX, FABIA, ISA, QUBIC and SAMBA) were compared with each other in the cases where they were used to handle two expression datasets (GDS1620 and pathway) with different dimensions in Arabidopsis thaliana (A. thaliana) GO (gene ontology) annotation and PPI (protein-protein interaction) network were used to verify the corresponding biological significance of biclusters from the five algorithms. To compare the algorithms' performance and evaluate quality of identified biclusters, two scoring methods, namely weighted enrichment (WE) scoring and PPI scoring, were proposed in our study. For each dataset, after combining the scores of all biclusters into one unified ranking, we could evaluate the performance and behavior of the five biclustering algorithms in a better way. Results: Both WE and PPI scoring methods has been proved effective to validate biological significance of the biclusters, and a significantly positive correlation between the two sets of scores has been tested to demonstrate the consistence of these two methods. A comparative study of the above five algorithms has revealed that: (1) ISA is the most effective one among the five algorithms on the dataset of GDS1620 and BIMAX outperforms the other algorithms on the dataset of pathway. (2) Both ISA and BIMAX are data-dependent. The former one does not work well on the datasets with few genes, while the latter one holds well for the datasets with more conditions. (3) FABIA and QUBIC perform poorly in this study and they may be suitable to large datasets with more genes and more conditions. (4) SAMBA is also data-independent as it performs well on two given datasets. The comparison results provide useful information for researchers to choose a suitable algorithm for each given dataset.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] Biclustering in gene expression data by tendency
    Liu, JZ
    Yang, J
    Wang, W
    [J]. 2004 IEEE COMPUTATIONAL SYSTEMS BIOINFORMATICS CONFERENCE, PROCEEDINGS, 2004, : 182 - 193
  • [22] A review on biclustering of gene expression microarray data: algorithms, effective measures and validations
    Biswal, Bhawani Sankar
    Mohapatra, Anjali
    Vipsita, Swati
    [J]. INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2018, 21 (03) : 230 - 268
  • [23] Bayesian biclustering of gene expression data
    Gu, Jiajun
    Liu, Jun S.
    [J]. BMC GENOMICS, 2008, 9 (Suppl 1)
  • [24] Constructing gene network based on biclusters of expression data
    Liu, F.
    Yang, L.
    Tian, Z. Z.
    Wu, P.
    Sun, S. L.
    [J]. GENETICS AND MOLECULAR RESEARCH, 2016, 15 (02)
  • [25] Exhaustive search of maximal biclusters in gene expression data
    Okada, Yoshifumi
    Fujibuchi, Wataru
    Horton, Paul
    [J]. IMECS 2007: INTERNATIONAL MULTICONFERENCE OF ENGINEERS AND COMPUTER SCIENTISTS, VOLS I AND II, 2007, : 307 - +
  • [26] BICLUSTERING ANALYSIS OF GENE EXPRESSION DATA USING MULTI-OBJECTIVE EVOLUTIONARY ALGORITHMS
    Golchin, Maryam
    Davarpanah, Seyed Hashem
    Liew, Alan Wee-Chung
    [J]. PROCEEDINGS OF 2015 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOL. 2, 2015, : 505 - 510
  • [27] JBiclustGE: Java']Java API with unified biclustering algorithms for gene expression data analysis
    Rocha, Orlando
    Mendes, Rui
    [J]. KNOWLEDGE-BASED SYSTEMS, 2018, 155 : 83 - 87
  • [28] Biological evaluation of biclustering algorithms using Gene Ontology and chIP-chip data
    Tchagang, Alain B.
    Tewfik, Ahmed H.
    Benos, Panayiotis V.
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 637 - +
  • [29] Finding Correlated Biclusters from Gene Expression Data
    Yang, Wen-Hui
    Dai, Dao-Qing
    Yan, Hong
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2011, 23 (04) : 568 - 584
  • [30] Biclustering of Linear Patterns In Gene Expression Data
    Gao, Qinghui
    Ho, Christine
    Jia, Yingmin
    Li, Jingyi Jessica
    Huang, Haiyan
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2012, 19 (06) : 619 - 631