Combining Hierarchical Inference in Ontologies with Heterogeneous Data Sources Improves Gene Function Prediction

被引:2
|
作者
Jiang, Xiaoyu [1 ]
Nariai, Naoki [2 ]
Steffen, Martin [3 ]
Kasif, Simon [2 ,4 ]
Gold, David [1 ]
Kolaczyk, Eric D. [1 ]
机构
[1] Boston Univ, Dept Math & Stat, Boston, MA 02215 USA
[2] Boston Univ, Biol Informat Proc, Boston, MA 02215 USA
[3] Boston Univ, Dept Genet, Boston, MA 02215 USA
[4] Boston Univ, Dept Biomed Engn, Boston, MA 02215 USA
关键词
D O I
10.1109/BIBM.2008.37
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
The study of gene function is critical in various genomic and proteomic fields. Due to the availability of tremendous amounts of different types of protein data, integrating these datasets to predict function has become a significant opportunity in computational biology. In this paper, to predict protein function we (i) develop a novel Bayesian framework combining relations, hierarchical and structural information with improvement in data usage efficiency over similar methods, and (ii) propose to use it in conjunction with an integrative protein-protein association network, STRING (Search Tool for the Retrieval of INteracting Genes/proteins), which combines information from seven different sources. At the heart of our work is accomplishing protein data integration in a concerted fashion with respect to algorithm and data source. Method performance is assessed by a 5-fold cross-validation in yeast on selected terms from the Molecular Function ontology in the Gene Ontology database. Results show that our combined use of the proposed computational framework and the protein network from STRING offers substantial improvements in prediction. The benefits of using an aggressively integrative network, such as STRING, may derive from the fact that although it is likely that the ultimate gene interaction matrix (including but not limited to protein-protein, genetic, or regulatory interactions) will be sparse, presently it is still known only incompletely in most organisms, and thus the use of multiple distinct data sources is rewarded.
引用
收藏
页码:411 / +
页数:2
相关论文
共 50 条
  • [31] Integrating Heterogeneous Sources for Learned Prediction of Vehicular Data Consumption
    Zang, Andi
    Zhu, Xiaofeng
    Li, Ce
    Zhou, Fan
    Trajcevski, Goce
    2022 23RD IEEE INTERNATIONAL CONFERENCE ON MOBILE DATA MANAGEMENT (MDM 2022), 2022, : 54 - 63
  • [32] A BAYESIAN HIERARCHICAL MODEL FOR COMBINING MULTIPLE DATA SOURCES IN POPULATION SIZE ESTIMATION
    Parsons, Jacob
    Niu, Xiaoyue
    Bao, Le
    ANNALS OF APPLIED STATISTICS, 2022, 16 (03): : 1550 - 1562
  • [33] Methodology for the inference of gene function from phenotype data
    Joao A Ascensao
    Mary E Dolan
    David P Hill
    Judith A Blake
    BMC Bioinformatics, 15
  • [34] Methodology for the inference of gene function from phenotype data
    Ascensao, Joao A.
    Dolan, Mary E.
    Hill, David P.
    Blake, Judith A.
    BMC BIOINFORMATICS, 2014, 15
  • [35] Combining heterogeneous data sources for spatio-temporal mobility demand forecasting
    Prado-Rujas, Ignacio-Iker
    Serrano, Emilio
    Garcia-Dopico, Antonio
    Cordoba, M. Luisa
    Perez, Maria S.
    INFORMATION FUSION, 2023, 91 : 1 - 12
  • [36] CombFunc: predicting protein function using heterogeneous data sources
    Wass, Mark N.
    Barton, Geraint
    Sternberg, Michael J. E.
    NUCLEIC ACIDS RESEARCH, 2012, 40 (W1) : W466 - W470
  • [37] Annotating gene function by combining expression data with a modular gene network
    Shiga, Motoki
    Takigawa, Ichigaku
    Mamitsuka, Hiroshi
    BIOINFORMATICS, 2007, 23 (13) : I468 - I478
  • [38] Combining gene mutation with gene expression analysis improves outcome prediction in acute promyelocytic leukemia
    Lucena-Araujo, Antonio R.
    Coelho-Silva, Juan L.
    Pereira-Martins, Diego A.
    Silveira, Douglas R.
    Koury, Luisa C.
    Melo, Raul A. M.
    Bittencourt, Rosane
    Pagnano, Katia
    Pasquini, Ricardo
    Nunes, Elenaide C.
    Fagundes, Evandro M.
    Gloria, Ana B.
    Kerbauy, Fabio
    Chauffaille, Maria de Lourdes
    Bendit, Israel
    Rocha, Vanderson
    Keating, Armand
    Tallman, Martin S.
    Ribeiro, Raul C.
    Dillon, Richard
    Ganser, Arnold
    Lowenberg, Bob
    Valk, P. J. M.
    Lo-Coco, Francesco
    Sanz, Miguel A.
    Berliner, Nancy
    Rego, Eduardo M.
    BLOOD, 2019, 132 (12) : 951 - 959
  • [39] Integration of Multiple Data Sources for Gene Network Inference using Genetic Perturbation Data
    Liang, Xiao
    Young, William Chad
    Hung, Ling-Hong
    Raftery, Adrian E.
    Yeung, Ka Yee
    ACM-BCB'18: PROCEEDINGS OF THE 2018 ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY, AND HEALTH INFORMATICS, 2018, : 601 - 602
  • [40] Integration of Multiple Data Sources for Gene Network Inference Using Genetic Perturbation Data
    Liang, Xiao
    Young, William Chad
    Hung, Ling-Hong
    Raftery, Adrian E.
    Yeung, Ka Yee
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2019, 26 (10) : 1113 - 1129