Combining Hierarchical Inference in Ontologies with Heterogeneous Data Sources Improves Gene Function Prediction

被引:2
|
作者
Jiang, Xiaoyu [1 ]
Nariai, Naoki [2 ]
Steffen, Martin [3 ]
Kasif, Simon [2 ,4 ]
Gold, David [1 ]
Kolaczyk, Eric D. [1 ]
机构
[1] Boston Univ, Dept Math & Stat, Boston, MA 02215 USA
[2] Boston Univ, Biol Informat Proc, Boston, MA 02215 USA
[3] Boston Univ, Dept Genet, Boston, MA 02215 USA
[4] Boston Univ, Dept Biomed Engn, Boston, MA 02215 USA
关键词
D O I
10.1109/BIBM.2008.37
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
The study of gene function is critical in various genomic and proteomic fields. Due to the availability of tremendous amounts of different types of protein data, integrating these datasets to predict function has become a significant opportunity in computational biology. In this paper, to predict protein function we (i) develop a novel Bayesian framework combining relations, hierarchical and structural information with improvement in data usage efficiency over similar methods, and (ii) propose to use it in conjunction with an integrative protein-protein association network, STRING (Search Tool for the Retrieval of INteracting Genes/proteins), which combines information from seven different sources. At the heart of our work is accomplishing protein data integration in a concerted fashion with respect to algorithm and data source. Method performance is assessed by a 5-fold cross-validation in yeast on selected terms from the Molecular Function ontology in the Gene Ontology database. Results show that our combined use of the proposed computational framework and the protein network from STRING offers substantial improvements in prediction. The benefits of using an aggressively integrative network, such as STRING, may derive from the fact that although it is likely that the ultimate gene interaction matrix (including but not limited to protein-protein, genetic, or regulatory interactions) will be sparse, presently it is still known only incompletely in most organisms, and thus the use of multiple distinct data sources is rewarded.
引用
收藏
页码:411 / +
页数:2
相关论文
共 50 条
  • [1] A Bayesian framework for combining heterogeneous data sources for gene function prediction (in Saccharomyces cerevisiae)
    Troyanskaya, OG
    Dolinski, K
    Owen, AB
    Altman, RB
    Botstein, D
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2003, 100 (14) : 8348 - 8353
  • [2] Prediction of Gene Function Using Ensembles of SVMs and Heterogeneous Data Sources
    Re, Matteo
    Valentini, Giorgio
    APPLICATIONS OF SUPERVISED AND UNSUPERVISED ENSEMBLE METHODS, 2009, 245 : 79 - 91
  • [3] A Bayesian Hierarchical Network for Combining Heterogeneous Data Sources in Medical Diagnoses
    Donnat, Claire
    Miolane, Nina
    Bunbury, Freddy
    Kreindler, Jack
    MACHINE LEARNING FOR HEALTH, VOL 136, 2020, 136 : 53 - 84
  • [4] GO-At: in silico prediction of gene function in Arabidopsis thaliana by combining heterogeneous data
    Bradford, James R.
    Needham, Chris J.
    Tedder, Philip
    Care, Matthew A.
    Bulpitt, Andrew J.
    Westhead, David R.
    PLANT JOURNAL, 2010, 61 (04): : 713 - 721
  • [5] Protein Expression Data Improves Gene Function Prediction
    Yang, Huadong
    Song, Xiaofeng
    Guo, Xuejiang
    2016 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2016, : 1869 - 1870
  • [6] Incorporating function ontologies into the integration of data sources
    Tsai, HJ
    Xu, J
    Lin, S
    Miller, LL
    COMPUTERS AND THEIR APPLICATIONS, 2003, : 184 - 187
  • [7] Integration of heterogeneous data sources for gene function prediction using decision templates and ensembles of learning machines
    Re, Matteo
    Valentini, Giorgio
    NEUROCOMPUTING, 2010, 73 (7-9) : 1533 - 1537
  • [8] Combining gene mutation with gene expression data improves outcome prediction in myelodysplastic syndromes
    Moritz Gerstung
    Andrea Pellagatti
    Luca Malcovati
    Aristoteles Giagounidis
    Matteo G Della Porta
    Martin Jädersten
    Hamid Dolatshad
    Amit Verma
    Nicholas C. P. Cross
    Paresh Vyas
    Sally Killick
    Eva Hellström-Lindberg
    Mario Cazzola
    Elli Papaemmanuil
    Peter J. Campbell
    Jacqueline Boultwood
    Nature Communications, 6
  • [9] Combining gene mutation with gene expression data improves outcome prediction in myelodysplastic syndromes
    Gerstung, Moritz
    Pellagatti, Andrea
    Malcovati, Luca
    Giagounidis, Aristoteles
    Della Porta, Matteo G.
    Jaedersten, Martin
    Dolatshad, Hamid
    Verma, Amit
    Cross, Nicholas C. P.
    Vyas, Paresh
    Killick, Sally
    Hellstroem-Lindberg, Eva
    Cazzola, Mario
    Papaemmanuil, Elli
    Campbell, Peter J.
    Boultwood, Jacqueline
    NATURE COMMUNICATIONS, 2015, 6
  • [10] Gene clustering and gene function prediction using multiple sources of data
    Zare, Hossein
    Khodursky, Arkady B.
    Kaveh, Mostafa
    2006 IEEE INTERNATIONAL WORKSHOP ON GENOMIC SIGNAL PROCESSING AND STATISTICS, 2006, : 113 - +