Combining Hierarchical Inference in Ontologies with Heterogeneous Data Sources Improves Gene Function Prediction

被引:2
|
作者
Jiang, Xiaoyu [1 ]
Nariai, Naoki [2 ]
Steffen, Martin [3 ]
Kasif, Simon [2 ,4 ]
Gold, David [1 ]
Kolaczyk, Eric D. [1 ]
机构
[1] Boston Univ, Dept Math & Stat, Boston, MA 02215 USA
[2] Boston Univ, Biol Informat Proc, Boston, MA 02215 USA
[3] Boston Univ, Dept Genet, Boston, MA 02215 USA
[4] Boston Univ, Dept Biomed Engn, Boston, MA 02215 USA
关键词
D O I
10.1109/BIBM.2008.37
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
The study of gene function is critical in various genomic and proteomic fields. Due to the availability of tremendous amounts of different types of protein data, integrating these datasets to predict function has become a significant opportunity in computational biology. In this paper, to predict protein function we (i) develop a novel Bayesian framework combining relations, hierarchical and structural information with improvement in data usage efficiency over similar methods, and (ii) propose to use it in conjunction with an integrative protein-protein association network, STRING (Search Tool for the Retrieval of INteracting Genes/proteins), which combines information from seven different sources. At the heart of our work is accomplishing protein data integration in a concerted fashion with respect to algorithm and data source. Method performance is assessed by a 5-fold cross-validation in yeast on selected terms from the Molecular Function ontology in the Gene Ontology database. Results show that our combined use of the proposed computational framework and the protein network from STRING offers substantial improvements in prediction. The benefits of using an aggressively integrative network, such as STRING, may derive from the fact that although it is likely that the ultimate gene interaction matrix (including but not limited to protein-protein, genetic, or regulatory interactions) will be sparse, presently it is still known only incompletely in most organisms, and thus the use of multiple distinct data sources is rewarded.
引用
收藏
页码:411 / +
页数:2
相关论文
共 50 条
  • [41] Prediction of RNA subcellular localization: Learning from heterogeneous data sources
    Savulescu, Anca Flavia
    Bouilhol, Emmanuel
    Beaume, Nicolas
    Nikolski, Macha
    ISCIENCE, 2021, 24 (11)
  • [42] A Regression-based K nearest neighbor algorithm for gene function prediction from heterogeneous data
    Zizhen Yao
    Walter L Ruzzo
    BMC Bioinformatics, 7
  • [43] A Regression-based K nearest neighbor algorithm for gene function prediction from heterogeneous data
    Yao, ZZ
    Ruzzo, WL
    BMC BIOINFORMATICS, 2006, 7 (Suppl 1)
  • [44] Combining Machine Learning with Metabolomic and Embryologic Data Improves Embryo Implantation Prediction
    Aswathi Cheredath
    Shubhashree Uppangala
    Asha C. S
    Ameya Jijo
    Vani Lakshmi R
    Pratap Kumar
    David Joseph
    Nagana Gowda G.A
    Guruprasad Kalthur
    Satish Kumar Adiga
    Reproductive Sciences, 2023, 30 : 984 - 994
  • [45] Combining Machine Learning with Metabolomic and Embryologic Data Improves Embryo Implantation Prediction
    Cheredath, Aswathi
    Uppangala, Shubhashree
    Asha, C. S.
    Jijo, Ameya
    Lakshmi, Vani R.
    Kumar, Pratap
    Joseph, David
    Gowda, Nagana G. A.
    Kalthur, Guruprasad
    Adiga, Satish Kumar
    REPRODUCTIVE SCIENCES, 2023, 30 (03) : 984 - 994
  • [46] Improving the prediction of protein binding sites by combining heterogeneous data and Voronoi diagrams
    Segura, Joan
    Jones, Pamela F.
    Fernandez-Fuentes, Narcis
    BMC BIOINFORMATICS, 2011, 12
  • [47] Improving the prediction of protein binding sites by combining heterogeneous data and Voronoi diagrams
    Joan Segura
    Pamela F Jones
    Narcis Fernandez-Fuentes
    BMC Bioinformatics, 12
  • [48] Combining Donor Characteristics with Immunohistological Data Improves the Prediction of Islet Isolation Success
    Berkova, Zuzana
    Saudek, Frantisek
    Girman, Peter
    Zacharovova, Klara
    Kriz, Jan
    Fabryova, Eva
    Leontovyc, Ivan
    Koblas, Tomas
    Kosinova, Lucie
    Neskudla, Tomas
    Vavrova, Ema
    Habart, David
    Loukotova, Sarka
    Zahradnicka, Martina
    Lipar, Kvetoslav
    Voska, Ludek
    Skibova, Jelena
    JOURNAL OF DIABETES RESEARCH, 2016, 2016
  • [49] Optimizing data integration improves gene regulatory network inference in Arabidopsis thaliana
    Cassan, Oceane
    Lecellier, Charles-Henri
    Martin, Antoine
    Brehelin, Laurent
    Lebre, Sophie
    BIOINFORMATICS, 2024, 40 (07)
  • [50] Modelling of zero-inflation improves inference of metagenomic gene count data
    Jonsson, Viktor
    Osterlund, Tobias
    Nerman, Olle
    Kristiansson, Erik
    STATISTICAL METHODS IN MEDICAL RESEARCH, 2019, 28 (12) : 3712 - 3728