A comparative analysis of biclustering algorithms for gene expression data

被引:161
|
作者
Eren, Kemal [1 ]
Deveci, Mehmet [1 ]
Kucuktunc, Onur [1 ]
Catalyurek, Umit V. [2 ,3 ]
机构
[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
[2] Ohio State Univ, Dept Biomed Informat, Columbus, OH 43210 USA
[3] Ohio State Univ, Dept Elect & Comp Engn, Columbus, OH 43210 USA
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
biclustering; microarray; gene expression; clustering; MICROARRAY DATA; BIOCONDUCTOR; PATTERNS;
D O I
10.1093/bib/bbs032
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The need to analyze high-dimension biological data is driving the development of new data mining methods. Biclustering algorithms have been successfully applied to gene expression data to discover local patterns, in which a subset of genes exhibit similar expression levels over a subset of conditions. However, it is not clear which algorithms are best suited for this task. Many algorithms have been published in the past decade, most of which have been compared only to a small number of algorithms. Surveys and comparisons exist in the literature, but because of the large number and variety of biclustering algorithms, they are quickly outdated. In this article we partially address this problem of evaluating the strengths and weaknesses of existing biclustering methods. We used the BiBench package to compare 12 algorithms, many of which were recently published or have not been extensively studied. The algorithms were tested on a suite of synthetic data sets to measure their performance on data with varying conditions, such as different bicluster models, varying noise, varying numbers of biclusters and overlapping biclusters. The algorithms were also tested on eight large gene expression data sets obtained from the Gene Expression Omnibus. Gene Ontology enrichment analysis was performed on the resulting biclusters, and the best enrichment terms are reported. Our analyses show that the biclustering method and its parameters should be selected based on the desired model, whether that model allows overlapping biclusters, and its robustness to noise. In addition, we observe that the biclustering algorithms capable of finding more than one model are more successful at capturing biologically relevant clusters.
引用
收藏
页码:279 / 292
页数:14
相关论文
共 50 条
  • [41] Biclustering Analysis Using Plaid Model on Gene Expression Data of Colon Cancer
    Siswantining, Titin
    Aminanto, A. Eriza
    Sarwinda, Devvi
    Swasti, Olivia
    [J]. AUSTRIAN JOURNAL OF STATISTICS, 2021, 50 (05) : 101 - 114
  • [42] Quick hierarchical biclustering on microarray gene expression data
    Ji, Liping
    Mock, Kenneth Wei-Liang
    Tan, Kian-Lee
    [J]. BIBE 2006: SIXTH IEEE SYMPOSIUM ON BIOINFORMATICS AND BIOENGINEERING, PROCEEDINGS, 2006, : 110 - +
  • [43] A Study of Biclustering Coherence Measures for Gene Expression Data
    Padilha, Victor A.
    de Carvalho, Andre C. P. L. F.
    [J]. 2018 7TH BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), 2018, : 546 - 551
  • [44] Using the bagging approach for biclustering of gene expression data
    Hanczar, B.
    Nadif, M.
    [J]. NEUROCOMPUTING, 2011, 74 (10) : 1595 - 1605
  • [45] Application of simulated annealing to the biclustering of gene expression data
    Bryan, Kenneth
    Cunningham, Padraig
    Bolshakova, Nadia
    [J]. IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, 2006, 10 (03): : 519 - 525
  • [46] Biclustering gene expression data by an improved optimal algorithm
    Wang, MingQian
    Tian, Wei
    Kang, Hao
    Gao, WenJu
    [J]. MECHATRONICS AND INDUSTRIAL INFORMATICS, PTS 1-4, 2013, 321-324 : 2223 - 2226
  • [47] Biclustering of gene expression data using genetic algorithm
    Chakraborty, A
    Maka, H
    [J]. PROCEEDINGS OF THE 2005 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2005, : 17 - 24
  • [48] Performance Analysis of Gene Expression data using Biclustering Iterative Signature Algorithm
    Vengatesan, K.
    Singh, R. P.
    Bhaskar, Mahajan Sagar
    Padmanaban, Sanjeevikumar
    Ravishankar, T. Nadana
    Ramkumar, M.
    [J]. 2017 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING, INSTRUMENTATION AND CONTROL TECHNOLOGIES (ICICICT), 2017, : 7 - 11
  • [49] Evaluation of Plaid Models in Biclustering of Gene Expression Data
    Majd, Hamid Alavi
    Shahsavari, Soodeh
    Baghestani, Ahmad Reza
    Tabatabaei, Seyyed Mohammad
    Bashi, Naghme Khadem
    Tavirani, Mostafa Rezaei
    Hamidpour, Mohsen
    [J]. SCIENTIFICA, 2016, 2016
  • [50] Gene Expression Data Analysis Using a Novel Approach to Biclustering Combining Discrete and Continuous Data
    Christinat, Yann
    Wachmann, Bernd
    Zhang, Lei
    [J]. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2008, 5 (04) : 583 - 593