Combining Hierarchical Inference in Ontologies with Heterogeneous Data Sources Improves Gene Function Prediction

被引:2
|
作者
Jiang, Xiaoyu [1 ]
Nariai, Naoki [2 ]
Steffen, Martin [3 ]
Kasif, Simon [2 ,4 ]
Gold, David [1 ]
Kolaczyk, Eric D. [1 ]
机构
[1] Boston Univ, Dept Math & Stat, Boston, MA 02215 USA
[2] Boston Univ, Biol Informat Proc, Boston, MA 02215 USA
[3] Boston Univ, Dept Genet, Boston, MA 02215 USA
[4] Boston Univ, Dept Biomed Engn, Boston, MA 02215 USA
关键词
D O I
10.1109/BIBM.2008.37
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
The study of gene function is critical in various genomic and proteomic fields. Due to the availability of tremendous amounts of different types of protein data, integrating these datasets to predict function has become a significant opportunity in computational biology. In this paper, to predict protein function we (i) develop a novel Bayesian framework combining relations, hierarchical and structural information with improvement in data usage efficiency over similar methods, and (ii) propose to use it in conjunction with an integrative protein-protein association network, STRING (Search Tool for the Retrieval of INteracting Genes/proteins), which combines information from seven different sources. At the heart of our work is accomplishing protein data integration in a concerted fashion with respect to algorithm and data source. Method performance is assessed by a 5-fold cross-validation in yeast on selected terms from the Molecular Function ontology in the Gene Ontology database. Results show that our combined use of the proposed computational framework and the protein network from STRING offers substantial improvements in prediction. The benefits of using an aggressively integrative network, such as STRING, may derive from the fact that although it is likely that the ultimate gene interaction matrix (including but not limited to protein-protein, genetic, or regulatory interactions) will be sparse, presently it is still known only incompletely in most organisms, and thus the use of multiple distinct data sources is rewarded.
引用
收藏
页码:411 / +
页数:2
相关论文
共 50 条
  • [21] Combining heterogeneous data sources for accurate functional annotation of proteins
    Sokolov, Artem
    Funk, Christopher
    Graim, Kiley
    Verspoor, Karin
    Ben-Hur, Asa
    BMC BIOINFORMATICS, 2013, 14
  • [22] Cross-ontological analytics: Combining associative and hierarchical relations in the gene ontologies to assess gene product similarity
    Posse, C.
    Sanfilippo, A.
    Gopalan, B.
    Riensche, R.
    Beagley, N.
    Baddeley, B.
    COMPUTATIONAL SCIENCE - ICCS 2006, PT 2, PROCEEDINGS, 2006, 3992 : 871 - 878
  • [24] Combining Gene Signatures Improves Prediction of Breast Cancer Survival
    Zhao, Xi
    Rodland, Einar Andreas
    Sorlie, Therese
    Naume, Bjorn
    Langerod, Anita
    Frigessi, Arnoldo
    Kristensen, Vessela N.
    Borresen-Dale, Anne-Lise
    Lingjaerde, Ole Christian
    PLOS ONE, 2011, 6 (03):
  • [25] A Joint Neural Network Model for Combining Heterogeneous User Data Sources: An Example of At-Risk Student Prediction
    Qiao, Chen
    Hu, Xiao
    JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, 2020, 71 (10) : 1192 - 1204
  • [26] Gene Function Prediction Based on the Gene Ontology Hierarchical Structure
    Cheng, Liangxi
    Lin, Hongfei
    Hu, Yuncui
    Wang, Jian
    Yang, Zhihao
    PLOS ONE, 2014, 9 (09):
  • [27] Combining disparate data sources for improved poverty prediction and mapping
    Pokhriyal, Neeti
    Jacques, Damien Christophe
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2017, 114 (46) : E9783 - E9792
  • [28] Hierarchical multi-label prediction of gene function
    Barutcuoglu, Z
    Schapire, RE
    Troyanskaya, OG
    BIOINFORMATICS, 2006, 22 (07) : 830 - 836
  • [29] BENIN: combining knockout data with time series gene expression data for the gene regulatory network inference
    Kamgnia, Stephanie
    Butler, Gregory
    PROCEEDINGS OF THE TENTH INTERNATIONAL CONFERENCE ON COMPUTATIONAL SYSTEMS-BIOLOGY AND BIOINFORMATICS (CSBIO 2019), 2019,
  • [30] Toxicity prediction using heterogeneous chemical and biological data sources
    Bender, Andreas
    TOXICOLOGY LETTERS, 2014, 229 : S4 - S4