Prediction and Validation of Gene-Disease Associations Using Methods Inspired by Social Network Analyses

被引:117
|
作者
Singh-Blom, U. Martin [1 ,2 ]
Natarajan, Nagarajan [3 ]
Tewari, Ambuj [4 ]
Woods, John O. [1 ]
Dhillon, Inderjit S. [3 ]
Marcotte, Edward M. [1 ,5 ]
机构
[1] Univ Texas Austin, Ctr Syst & Synthet Biol, Inst Cellular & Mol Biol, Austin, TX 78712 USA
[2] Karolinska Inst, Dept Med, Stockholm, Sweden
[3] Univ Texas Austin, Dept Comp Sci, Austin, TX 78712 USA
[4] Univ Michigan, Dept Stat, Ann Arbor, MI 48109 USA
[5] Univ Texas Austin, Dept Chem & Biochem, Austin, TX 78712 USA
来源
PLOS ONE | 2013年 / 8卷 / 05期
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
GENOME; DATABASE; PRIORITIZATION; IDENTIFICATION; INTEGRATION; PHENOTYPE; RESOURCE; BIOLOGY; WALKING; MODELS;
D O I
10.1371/journal.pone.0058977
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Correctly identifying associations of genes with diseases has long been a goal in biology. With the emergence of large-scale gene-phenotype association datasets in biology, we can leverage statistical and machine learning methods to help us achieve this goal. In this paper, we present two methods for predicting gene-disease associations based on functional gene associations and gene-phenotype associations in model organisms. The first method, the Katz measure, is motivated from its success in social network link prediction, and is very closely related to some of the recent methods proposed for gene-disease association inference. The second method, called CATAPULT (Combining dATa Across species using Positive-Unlabeled Learning Techniques), is a supervised machine learning method that uses a biased support vector machine where the features are derived from walks in a heterogeneous gene-trait network. We study the performance of the proposed methods and related state-of-the-art methods using two different evaluation strategies, on two distinct data sets, namely OMIM phenotypes and drug-target interactions. Finally, by measuring the performance of the methods using two different evaluation strategies, we show that even though both methods perform very well, the Katz measure is better at identifying associations between traits and poorly studied genes, whereas CATAPULT is better suited to correctly identifying gene-trait associations overall. The authors want to thank Jon Laurent and Kris McGary for some of the data used, and Li and Patra for making their code available. Most of Ambuj Tewari's contribution to this work happened while he was a postdoctoral fellow at the University of Texas at Austin.
引用
收藏
页数:17
相关论文
共 50 条
  • [41] Effect of Including Environmental Data in Investigations of Gene-Disease Associations in the Presence of Qualitative Interactions
    Williamson, Elizabeth
    Ponsonby, Anne-Louise
    Carlin, John
    Dwyer, Terry
    GENETIC EPIDEMIOLOGY, 2010, 34 (06) : 552 - 560
  • [42] Multi-ontology embeddings approach on human-aligned multi-ontologies representation for gene-disease associations prediction
    Wang, Yihao
    Wegner, Philipp
    Domingo-Fernandez, Daniel
    Kodamullil, Alpha Tom
    HELIYON, 2023, 9 (11)
  • [43] PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations
    Denny, Joshua C.
    Ritchie, Marylyn D.
    Basford, Melissa A.
    Pulley, Jill M.
    Bastarache, Lisa
    Brown-Gentry, Kristin
    Wang, Deede
    Masys, Dan R.
    Roden, Dan M.
    Crawford, Dana C.
    BIOINFORMATICS, 2010, 26 (09) : 1205 - 1210
  • [44] Turning the pump handle: Evolving methods for integrating the evidence on gene-disease association
    Higgins, Julian P. T.
    Little, Julian
    Ioannidis, John P. A.
    Bray, Molly S.
    Manolio, Teri A.
    Smeeth, Liam
    Sterne, Jonathan A.
    Anagnostelis, Betsy
    Butterworth, Adam S.
    Danesh, John
    Dezateux, Carol
    Gallacher, John E.
    Gwinn, Marta
    Lewis, Sarah J.
    Minelli, Cosetta
    Pharoah, Paul D.
    Salanti, Georgia
    Sanderson, Simon
    Smith, Lesley A.
    Taioli, Emanuela
    Thompson, John R.
    Thompson, Simon G.
    Walker, Neil
    Zimmern, Ron L.
    Khoury, Muin J.
    AMERICAN JOURNAL OF EPIDEMIOLOGY, 2007, 166 (08) : 863 - 866
  • [45] Measuring gene-disease association using a general pair method
    Ward, PJ
    BonaitiPellie, C
    GENETIC EPIDEMIOLOGY, 1995, 12 (06) : 681 - 686
  • [46] Iteratively collective prediction of disease-gene associations through the incomplete network
    Meng, Xiangyi
    Zou, Quan
    Rodriguez-Paton, Alfonso
    Zeng, Xiangxiang
    2017 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2017, : 1324 - 1330
  • [47] Assessing Gene-Disease Relationship with Multifunctional Genes Using GO
    Al-Mubaid, Hisham
    Shenify, Mohamed
    Aljandali, Sultan
    2016 IEEE/ACS 13TH INTERNATIONAL CONFERENCE OF COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2016,
  • [48] Impact of violations and deviations in Hardy-Weinberg equilibrium on postulated gene-disease associations
    Trikalinos, TA
    Salanti, G
    Khoury, MJ
    Ioannidis, JPA
    AMERICAN JOURNAL OF EPIDEMIOLOGY, 2006, 163 (04) : 300 - 309
  • [49] Multi-domain knowledge graph embeddings for gene-disease association prediction
    Nunes, Susana
    Sousa, Rita T.
    Pesquita, Catia
    JOURNAL OF BIOMEDICAL SEMANTICS, 2023, 14 (01)
  • [50] Bootstrap inference with neural-network modeling for gene-disease association testing
    Matchenko-Shimko, N.
    Dube, M. P.
    PROCEEDINGS OF THE 2006 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2006, : 299 - +