A comparison and evaluation of five biclustering algorithms by quantifying goodness of biclusters for gene expression data

被引：16

作者：

Li, Li ^{[1
,2
,4
,5
]}

Guo, Yang ^{[3
,5
]}

Wu, Wenwu ^{[1
,2
,3
,5
]}

Shi, Youyi ^{[1
,2
,4
,5
]}

Cheng, Jian ^{[1
,2
,3
,5
]}

Tao, Shiheng ^{[1
,2
,3
,5
]}

机构：

[1] Northwest A&F Univ, State Key Lab Crop Stress Biol Arid Areas, Yangling 712100, Shaanxi, Peoples R China

[2] Northwest A&F Univ, Coll Sci, Yangling 712100, Shaanxi, Peoples R China

[3] Northwest A&F Univ, Coll Life Sci, Yangling 712100, Shaanxi, Peoples R China

[4] Northwest A&F Univ, Inst Appl Math, Yangling 712100, Shaanxi, Peoples R China

[5] Northwest A&F Univ, Bioinformat Ctr, Yangling 712100, Shaanxi, Peoples R China

来源：

BIODATA MINING | 2012年 / 5卷

关键词：

MICROARRAY DATA-ANALYSIS; PROTEINS;

D O I：

10.1186/1756-0381-5-8

中图分类号：

Q [生物科学];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Background: Several biclustering algorithms have been proposed to identify biclusters, in which genes share similar expression patterns across a number of conditions. However, different algorithms would yield different biclusters and further lead to distinct conclusions. Therefore, some testing and comparisons between these algorithms are strongly required. Methods: In this study, five biclustering algorithms (i.e. BIMAX, FABIA, ISA, QUBIC and SAMBA) were compared with each other in the cases where they were used to handle two expression datasets (GDS1620 and pathway) with different dimensions in Arabidopsis thaliana (A. thaliana) GO (gene ontology) annotation and PPI (protein-protein interaction) network were used to verify the corresponding biological significance of biclusters from the five algorithms. To compare the algorithms' performance and evaluate quality of identified biclusters, two scoring methods, namely weighted enrichment (WE) scoring and PPI scoring, were proposed in our study. For each dataset, after combining the scores of all biclusters into one unified ranking, we could evaluate the performance and behavior of the five biclustering algorithms in a better way. Results: Both WE and PPI scoring methods has been proved effective to validate biological significance of the biclusters, and a significantly positive correlation between the two sets of scores has been tested to demonstrate the consistence of these two methods. A comparative study of the above five algorithms has revealed that: (1) ISA is the most effective one among the five algorithms on the dataset of GDS1620 and BIMAX outperforms the other algorithms on the dataset of pathway. (2) Both ISA and BIMAX are data-dependent. The former one does not work well on the datasets with few genes, while the latter one holds well for the datasets with more conditions. (3) FABIA and QUBIC perform poorly in this study and they may be suitable to large datasets with more genes and more conditions. (4) SAMBA is also data-independent as it performs well on two given datasets. The comparison results provide useful information for researchers to choose a suitable algorithm for each given dataset.

引用

页数：10

共 50 条

[41] Evaluation and comparison of clustering algorithms in analyzing es cell gene expression data
Chen, GX
Jaradat, SA
Banerjee, N
Tanaka, TS
Ko, MSH
Zhang, MQ
STATISTICA SINICA, 2002, 12 (01) : 241 - 262
[42] Evolutionary fuzzy biclustering of gene expression data
Mitra, Sushmita
Banka, Haider
Paik, Jiaul Hoque
ROUGH SETS AND KNOWLEDGE TECHNOLOGY, PROCEEDINGS, 2007, 4481 : 284 - +
[43] Biclustering of gene expression data using biclustering iterative signature algorithm and biclustering coherent column
Kumar, E. Saravana
Vengatesan, K.
Singh, R. P.
Rajan, C.
INTERNATIONAL JOURNAL OF BIOMEDICAL ENGINEERING AND TECHNOLOGY, 2018, 26 (3-4) : 341 - 352
[44] Mining Functional Biclusters of DNA Microarray Gene Expression Data
Zhao, Hongya
Huang, Qing-Hua
Chan, Kwok Leung
Cheng, Lee-Ming
Yan, Hong
2008 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), VOLS 1-6, 2008, : 1736 - 1741
[45] Evaluation of clustering algorithms for gene expression data
Datta, Susmita
Datta, Somnath
BMC BIOINFORMATICS, 2006, 7 (Suppl 4)
[46] Evaluation of clustering algorithms for gene expression data
Susmita Datta
Somnath Datta
BMC Bioinformatics, 7
[47] Parallelized Evolutionary Learning for Detection of Biclusters in Gene Expression Data
Huang, Qinghua
Tao, Dacheng
Li, Xuelong
Liew, Alan Wee-Chung
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2012, 9 (02) : 560 - 570
[48] Extending Probabilistic Encoding for Discovering Biclusters in Gene Expression Data
Javier Gil-Cumbreras, Francisco
Giraldez, Raul
Aguilar-Ruiz, Jesus S.
HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, 2016, 9648 : 706 - 717
[49] Finding k-Biclusters from Gene Expression Data
Xu, Xiaohua
He, Ping
Lu, Lin
Xi, Yanqiu
Pan, Zhoujin
INTELLIGENT COMPUTING THEORIES AND APPLICATIONS, ICIC 2012, 2012, 7390 : 433 - 439
[50] Ensemble Cuckoo Search Biclustering of the gene expression data
Yin, Lu
Liu, Yongguo
2016 IEEE 15TH INTERNATIONAL CONFERENCE ON COGNITIVE INFORMATICS & COGNITIVE COMPUTING (ICCI*CC), 2016, : 419 - 422

← 1 2 3 4 5 →