A comparison and evaluation of five biclustering algorithms by quantifying goodness of biclusters for gene expression data

被引:16
|
作者
Li, Li [1 ,2 ,4 ,5 ]
Guo, Yang [3 ,5 ]
Wu, Wenwu [1 ,2 ,3 ,5 ]
Shi, Youyi [1 ,2 ,4 ,5 ]
Cheng, Jian [1 ,2 ,3 ,5 ]
Tao, Shiheng [1 ,2 ,3 ,5 ]
机构
[1] Northwest A&F Univ, State Key Lab Crop Stress Biol Arid Areas, Yangling 712100, Shaanxi, Peoples R China
[2] Northwest A&F Univ, Coll Sci, Yangling 712100, Shaanxi, Peoples R China
[3] Northwest A&F Univ, Coll Life Sci, Yangling 712100, Shaanxi, Peoples R China
[4] Northwest A&F Univ, Inst Appl Math, Yangling 712100, Shaanxi, Peoples R China
[5] Northwest A&F Univ, Bioinformat Ctr, Yangling 712100, Shaanxi, Peoples R China
来源
BIODATA MINING | 2012年 / 5卷
关键词
MICROARRAY DATA-ANALYSIS; PROTEINS;
D O I
10.1186/1756-0381-5-8
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: Several biclustering algorithms have been proposed to identify biclusters, in which genes share similar expression patterns across a number of conditions. However, different algorithms would yield different biclusters and further lead to distinct conclusions. Therefore, some testing and comparisons between these algorithms are strongly required. Methods: In this study, five biclustering algorithms (i.e. BIMAX, FABIA, ISA, QUBIC and SAMBA) were compared with each other in the cases where they were used to handle two expression datasets (GDS1620 and pathway) with different dimensions in Arabidopsis thaliana (A. thaliana) GO (gene ontology) annotation and PPI (protein-protein interaction) network were used to verify the corresponding biological significance of biclusters from the five algorithms. To compare the algorithms' performance and evaluate quality of identified biclusters, two scoring methods, namely weighted enrichment (WE) scoring and PPI scoring, were proposed in our study. For each dataset, after combining the scores of all biclusters into one unified ranking, we could evaluate the performance and behavior of the five biclustering algorithms in a better way. Results: Both WE and PPI scoring methods has been proved effective to validate biological significance of the biclusters, and a significantly positive correlation between the two sets of scores has been tested to demonstrate the consistence of these two methods. A comparative study of the above five algorithms has revealed that: (1) ISA is the most effective one among the five algorithms on the dataset of GDS1620 and BIMAX outperforms the other algorithms on the dataset of pathway. (2) Both ISA and BIMAX are data-dependent. The former one does not work well on the datasets with few genes, while the latter one holds well for the datasets with more conditions. (3) FABIA and QUBIC perform poorly in this study and they may be suitable to large datasets with more genes and more conditions. (4) SAMBA is also data-independent as it performs well on two given datasets. The comparison results provide useful information for researchers to choose a suitable algorithm for each given dataset.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] A comparison and evaluation of five biclustering algorithms by quantifying goodness of biclusters for gene expression data
    Li Li
    Yang Guo
    Wenwu Wu
    Youyi Shi
    Jian Cheng
    Shiheng Tao
    [J]. BioData Mining, 5
  • [2] Differential co-expression framework to quantify goodness of biclusters and compare biclustering algorithms
    Chia, Burton Kuan Hui
    Karuturi, R. Krishna Murthy
    [J]. ALGORITHMS FOR MOLECULAR BIOLOGY, 2010, 5
  • [3] Differential co-expression framework to quantify goodness of biclusters and compare biclustering algorithms
    Burton Kuan Hui Chia
    R Krishna Murthy Karuturi
    [J]. Algorithms for Molecular Biology, 5
  • [4] A systematic comparison and evaluation of biclustering methods for gene expression data
    Prelic, A
    Bleuler, S
    Zimmermann, P
    Wille, A
    Bühlmann, P
    Gruissem, W
    Hennig, L
    Thiele, L
    Zitzler, E
    [J]. BIOINFORMATICS, 2006, 22 (09) : 1122 - 1129
  • [5] On Evolutionary Algorithms for Biclustering of Gene Expression Data
    Carballido Jessica, A.
    Gallo Cristian, A.
    Dussaut Julieta, S.
    Ignacio, Ponzoni
    [J]. CURRENT BIOINFORMATICS, 2015, 10 (03) : 259 - 267
  • [6] Comparison Analysis of Biclustering Algorithms with the Use of Artificial Data and Gene Expression Profiles
    Babichev, S.
    Lytvynenko, V.
    Voronenko, M.
    Osypenko, V.
    Korobchynskyi, M.
    [J]. 2018 IEEE 38TH INTERNATIONAL CONFERENCE ON ELECTRONICS AND NANOTECHNOLOGY (ELNANO), 2018, : 298 - 304
  • [7] Comparison of sparse biclustering algorithms for gene expression datasets
    Nicholls, Kath
    Wallace, Chris
    [J]. BRIEFINGS IN BIOINFORMATICS, 2021, 22 (06)
  • [8] A comparative analysis of biclustering algorithms for gene expression data
    Eren, Kemal
    Deveci, Mehmet
    Kucuktunc, Onur
    Catalyurek, Umit V.
    [J]. BRIEFINGS IN BIOINFORMATICS, 2013, 14 (03) : 279 - 292
  • [9] An evaluation study of biclusters visualization techniques of gene expression data
    Aouabed, Haithem
    Elloumi, Mourad
    Santamaria, Rodrigo
    [J]. JOURNAL OF INTEGRATIVE BIOINFORMATICS, 2021, 18 (04)
  • [10] Comparison of BiClusO with Five Different Biclustering Algorithms Using Biological and Synthetic Data
    Karim, Mohammad Bozlul
    Kanaya, Shigehiko
    Altaf-Ul Amin, Md
    [J]. COMPLEX NETWORKS AND THEIR APPLICATIONS VII, VOL 2, 2019, 813 : 575 - 585