A comparison and evaluation of five biclustering algorithms by quantifying goodness of biclusters for gene expression data

被引:16
|
作者
Li, Li [1 ,2 ,4 ,5 ]
Guo, Yang [3 ,5 ]
Wu, Wenwu [1 ,2 ,3 ,5 ]
Shi, Youyi [1 ,2 ,4 ,5 ]
Cheng, Jian [1 ,2 ,3 ,5 ]
Tao, Shiheng [1 ,2 ,3 ,5 ]
机构
[1] Northwest A&F Univ, State Key Lab Crop Stress Biol Arid Areas, Yangling 712100, Shaanxi, Peoples R China
[2] Northwest A&F Univ, Coll Sci, Yangling 712100, Shaanxi, Peoples R China
[3] Northwest A&F Univ, Coll Life Sci, Yangling 712100, Shaanxi, Peoples R China
[4] Northwest A&F Univ, Inst Appl Math, Yangling 712100, Shaanxi, Peoples R China
[5] Northwest A&F Univ, Bioinformat Ctr, Yangling 712100, Shaanxi, Peoples R China
来源
BIODATA MINING | 2012年 / 5卷
关键词
MICROARRAY DATA-ANALYSIS; PROTEINS;
D O I
10.1186/1756-0381-5-8
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: Several biclustering algorithms have been proposed to identify biclusters, in which genes share similar expression patterns across a number of conditions. However, different algorithms would yield different biclusters and further lead to distinct conclusions. Therefore, some testing and comparisons between these algorithms are strongly required. Methods: In this study, five biclustering algorithms (i.e. BIMAX, FABIA, ISA, QUBIC and SAMBA) were compared with each other in the cases where they were used to handle two expression datasets (GDS1620 and pathway) with different dimensions in Arabidopsis thaliana (A. thaliana) GO (gene ontology) annotation and PPI (protein-protein interaction) network were used to verify the corresponding biological significance of biclusters from the five algorithms. To compare the algorithms' performance and evaluate quality of identified biclusters, two scoring methods, namely weighted enrichment (WE) scoring and PPI scoring, were proposed in our study. For each dataset, after combining the scores of all biclusters into one unified ranking, we could evaluate the performance and behavior of the five biclustering algorithms in a better way. Results: Both WE and PPI scoring methods has been proved effective to validate biological significance of the biclusters, and a significantly positive correlation between the two sets of scores has been tested to demonstrate the consistence of these two methods. A comparative study of the above five algorithms has revealed that: (1) ISA is the most effective one among the five algorithms on the dataset of GDS1620 and BIMAX outperforms the other algorithms on the dataset of pathway. (2) Both ISA and BIMAX are data-dependent. The former one does not work well on the datasets with few genes, while the latter one holds well for the datasets with more conditions. (3) FABIA and QUBIC perform poorly in this study and they may be suitable to large datasets with more genes and more conditions. (4) SAMBA is also data-independent as it performs well on two given datasets. The comparison results provide useful information for researchers to choose a suitable algorithm for each given dataset.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] An evolutionary approach for biclustering of gene expression data
    Sheta, Walaa
    Hany, Maha
    Mahdi, Shereef
    INTERNATIONAL JOURNAL OF BIO-INSPIRED COMPUTATION, 2010, 2 (06) : 413 - 421
  • [32] Extraction of Optimal Biclusters from Gene Expression Data
    Bagyamani, J.
    Thangavel, K.
    Rathipriya, R.
    INFORMATION AND COMMUNICATION TECHNOLOGIES, 2010, 101 : 380 - +
  • [33] Evolutionary Biclustering Algorithm of Gene Expression Data
    Ayadi, Wassim
    Maatouk, Ons
    Bouziri, Hend
    2012 23RD INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATIONS (DEXA), 2012, : 206 - 210
  • [34] Rough overlapping biclustering of gene expression data
    Wang, Ruizhi
    Miao, Duoqian
    Li, Gang
    Zhang, Hongyun
    PROCEEDINGS OF THE 7TH IEEE INTERNATIONAL SYMPOSIUM ON BIOINFORMATICS AND BIOENGINEERING, VOLS I AND II, 2007, : 828 - 834
  • [35] An improved biclustering algorithm for gene expression data
    Jin, Sheng-Hua
    Hua, Li
    Open Cybernetics and Systemics Journal, 2014, 8 : 1141 - 1144
  • [36] Biclustering gene expression data in the presence of noise
    Abdullah, A
    Hussain, A
    ARTIFICIAL NEURAL NETWORKS: BIOLOGICAL INSPIRATIONS - ICANN 2005, PT 1, PROCEEDINGS, 2005, 3696 : 611 - 616
  • [37] An EA framework for biclustering of gene expression data
    Bleuler, S
    Preli, A
    Zitzler, E
    CEC2004: PROCEEDINGS OF THE 2004 CONGRESS ON EVOLUTIONARY COMPUTATION, VOLS 1 AND 2, 2004, : 166 - 173
  • [38] An improved biclustering algorithm for gene expression data
    Jin, Sheng-Hua
    Hua, Li
    Open Cybernetics and Systemics Journal, 2014, 8 (01): : 1141 - 1144
  • [39] Biclustering of gene expression data by simulated annealing
    Chakraborty, Anupam
    EIGHTH INTERNATIONAL CONFERENCE ON HIGH-PERFORMANCE COMPUTING IN ASIA-PACIFIC REGION, PROCEEDINGS, 2005, : 627 - 632
  • [40] Comparative Analysis and Evaluation of Biclustering Algorithms for Microarray Data
    Maind, Ankush
    Raut, Shital
    NETWORKING COMMUNICATION AND DATA KNOWLEDGE ENGINEERING, VOL 2, 2018, 4 : 159 - 171