A Bayesian network classification methodology for gene expression data

被引:18
|
作者
Helman, P
Veroff, R
Atlas, SR
Willman, C
机构
[1] Univ New Mexico, Dept Comp Sci, Farris Engn Ctr, Albuquerque, NM 87131 USA
[2] Univ New Mexico, Dept Phys & Astron, Albuquerque, NM 87131 USA
[3] Univ New Mexico, Ctr Adv Studies, Albuquerque, NM 87131 USA
[4] Univ New Mexico, Sch Med, Dept Pathol, Albuquerque, NM 87131 USA
[5] Univ New Mexico, Sch Med, Canc Res & Treatment Ctr, Albuquerque, NM 87131 USA
关键词
Bayesian networks; classification; feature selection; gene expression; microarray data; normalization;
D O I
10.1089/cmb.2004.11.581
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
We present new techniques for the application of a Bayesian network learning framework to the problem of classifying gene expression data. The focus on classification permits us to develop techniques that address in several ways the complexities of learning Bayesian nets. Our classification model reduces the Bayesian network learning problem to the problem of learning multiple subnetworks, each consisting of a class label node and its set of parent genes. We argue that this classification model is more appropriate for the gene expression domain than are other structurally similar Bayesian network classification models, such as Naive Bayes and Tree Augmented Naive Bayes (TAN), because our model is consistent with prior domain experience suggesting that a relatively small number of genes, taken in different combinations, is required to predict most clinical classes of interest. Within this framework, we consider two different approaches to identifying parent sets which are supported by the gene expression observations and any other currently available evidence. One approach employs a simple greedy algorithm to search the universe of all genes; the second approach develops and applies a gene selection algorithm whose results are incorporated as a prior to enable an exhaustive search for parent sets over a restricted universe of genes. Two other significant contributions are the construction of classifiers from multiple, competing Bayesian network hypotheses and algorithmic methods for normalizing and binning gene expression data in the absence of prior expert knowledge. Our classifiers are developed under a cross validation regimen and then validated on corresponding out-of-sample test sets. The classifiers attain a classification rate in excess of 90% on out-of-sample test sets for two publicly available datasets. We present an extensive compilation of results reported in the literature for other classification methods run against these same two datasets. Our results are comparable to, or better than, any we have found reported for these two sets, when a train-test protocol as stringent as ours is followed.
引用
收藏
页码:581 / 615
页数:35
相关论文
共 50 条
  • [21] Cancer classification of single-cell gene expression data by neural network
    Kim, Bong-Hyun
    Yu, Kijin
    Lee, Peter C. W.
    [J]. BIOINFORMATICS, 2020, 36 (05) : 1360 - 1366
  • [22] Framework for classification of cancer gene expression data using Bayesian hyper-parameter optimization
    Koul, Nimrita
    Manvi, Sunilkumar S.
    [J]. MEDICAL & BIOLOGICAL ENGINEERING & COMPUTING, 2021, 59 (11-12) : 2353 - 2371
  • [23] Framework for classification of cancer gene expression data using Bayesian hyper-parameter optimization
    Nimrita Koul
    Sunilkumar S. Manvi
    [J]. Medical & Biological Engineering & Computing, 2021, 59 : 2353 - 2371
  • [24] Gene Expression Data Classification by VVRKFA
    Ghorai, Santanu
    Mukherjee, Anirban
    Dutta, Pranab K.
    [J]. 2ND INTERNATIONAL CONFERENCE ON COMPUTER, COMMUNICATION, CONTROL AND INFORMATION TECHNOLOGY (C3IT-2012), 2012, 4 : 330 - 335
  • [25] Fuzzy classification of gene expression data
    Schaefer, Gerald
    Nakashima, Tomoharu
    Yokota, Yasuyuki
    Ishibuchi, Hisao
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOLS 1-4, 2007, : 1095 - +
  • [26] Inferring Gene Regulatory Networks from Gene Expression Data by a Dynamic Bayesian Network-Based Model
    Chai, Lian En
    Mohamad, Mohd Saberi
    Deris, Safaai
    Chong, Chuii Khim
    Choon, Yee Wen
    Ibrahim, Zuwairie
    Omatu, Sigeru
    [J]. DISTRIBUTED COMPUTING AND ARTIFICIAL INTELLIGENCE, 2012, 151 : 379 - +
  • [27] Estimating gene networks from gene expression data by combining Bayesian network model with promoter element detection
    Tamada, Yoshinori
    Kim, SunYong
    Bannai, Hideo
    Imoto, Seiya
    Tashiro, Kousuke
    Kuhara, Satoru
    Miyano, Satoru
    [J]. BIOINFORMATICS, 2003, 19 : II227 - II236
  • [28] Evaluation of causal Bayesian network search algorithms using simulated mesotheliomas gene expression data
    Yoo, Changwon
    Wilcox, Meredith
    [J]. 19TH INTERNATIONAL CONGRESS ON MODELLING AND SIMULATION (MODSIM2011), 2011, : 1042 - 1048
  • [29] A model for gene selection and classification of gene expression data
    Mohamad M.S.
    Omatu S.
    Deris S.
    Hashim S.Z.M.
    [J]. Artificial Life and Robotics, 2007, 11 (2) : 219 - 222
  • [30] Bayesian Multiclass Classification of Gene Expression Colorectal Cancer Stages
    Simjanoska, Monika
    Bogdanova, Ana Madevska
    Popeska, Zaneta
    [J]. ICT INNOVATIONS 2013: ICT INNOVATIONS AND EDUCATION, 2014, 231 : 177 - 186