In silico prediction of novel therapeutic targets using gene-disease association data

被引:66
|
作者
Ferrero, Enrico [1 ]
Dunham, Ian [2 ,3 ]
Sanseau, Philippe [1 ,3 ]
机构
[1] GSK Med Res Ctr, Computat Biol & Stats, Target Sci, Gunnels Wood Rd, Stevenage SG1 2NY, Herts, England
[2] EMBL, EBI, Wellcome Genome Campus, Cambridge CB10 1SD, England
[3] Open Targets, Wellcome Genome Campus, Cambridge CB10 1SD, England
关键词
Drug discovery; Target discovery; Gene-disease associations; Machine learning; Data mining; MATRIX METALLOPROTEINASES; DRUG-TARGETS; EXPRESSION; IDENTIFICATION; INTEGRATION; BROMODOMAIN; ALGORITHMS; SELECTION; SUPPORT; KV2.1;
D O I
10.1186/s12967-017-1285-6
中图分类号
R-3 [医学研究方法]; R3 [基础医学];
学科分类号
1001 ;
摘要
Background: Target identification and validation is a pressing challenge in the pharmaceutical industry, with many of the programmes that fail for efficacy reasons showing poor association between the drug target and the disease. Computational prediction of successful targets could have a considerable impact on attrition rates in the drug discovery pipeline by significantly reducing the initial search space. Here, we explore whether gene-disease association data from the Open Targets platform is sufficient to predict therapeutic targets that are actively being pursued by pharmaceutical companies or are already on the market. Methods: To test our hypothesis, we train four different classifiers (a random forest, a support vector machine, a neural network and a gradient boosting machine) on partially labelled data and evaluate their performance using nested cross-validation and testing on an independent set. We then select the best performing model and use it to make predictions on more than 15,000 genes. Finally, we validate our predictions by mining the scientific literature for proposed therapeutic targets. Results: We observe that the data types with the best predictive power are animal models showing a disease-relevant phenotype, differential expression in diseased tissue and genetic association with the disease under investigation. On a test set, the neural network classifier achieves over 71% accuracy with an AUC of 0.76 when predicting therapeutic targets in a semi-supervised learning setting. We use this model to gain insights into current and failed programmes and to predict 1431 novel targets, of which a highly significant proportion has been independently proposed in the literature. Conclusions: Our in silico approach shows that data linking genes and diseases is sufficient to predict novel therapeutic targets effectively and confirms that this type of evidence is essential for formulating or strengthening hypotheses in the target discovery process. Ultimately, more rapid and automated target prioritisation holds the potential to reduce both the costs and the development times associated with bringing new medicines to patients.
引用
收藏
页数:16
相关论文
共 50 条
  • [31] Reporting, appraising, and integrating data on genotype prevalence and gene-disease associations
    Little, J
    Bradley, L
    Bray, MS
    Clyne, M
    Dorman, J
    Ellsworth, DL
    Hanson, J
    Khoury, M
    Lau, J
    O'Brien, TR
    Rothman, N
    Stroup, D
    Taioli, E
    Thomas, D
    Vainio, H
    Wacholder, S
    Weinberg, C
    AMERICAN JOURNAL OF EPIDEMIOLOGY, 2002, 156 (04) : 300 - 310
  • [32] MGREL: A multi-graph representation learning-based ensemble learning method for gene-disease association prediction
    Wang, Ziyang
    Gu, Yaowen
    Zheng, Si
    Yang, Lin
    Li, Jiao
    COMPUTERS IN BIOLOGY AND MEDICINE, 2023, 155
  • [33] Turning the pump handle: Evolving methods for integrating the evidence on gene-disease association
    Higgins, Julian P. T.
    Little, Julian
    Ioannidis, John P. A.
    Bray, Molly S.
    Manolio, Teri A.
    Smeeth, Liam
    Sterne, Jonathan A.
    Anagnostelis, Betsy
    Butterworth, Adam S.
    Danesh, John
    Dezateux, Carol
    Gallacher, John E.
    Gwinn, Marta
    Lewis, Sarah J.
    Minelli, Cosetta
    Pharoah, Paul D.
    Salanti, Georgia
    Sanderson, Simon
    Smith, Lesley A.
    Taioli, Emanuela
    Thompson, John R.
    Thompson, Simon G.
    Walker, Neil
    Zimmern, Ron L.
    Khoury, Muin J.
    AMERICAN JOURNAL OF EPIDEMIOLOGY, 2007, 166 (08) : 863 - 866
  • [34] Bootstrap inference with neural-network modeling for gene-disease association testing
    Matchenko-Shimko, N.
    Dube, M. P.
    PROCEEDINGS OF THE 2006 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2006, : 299 - +
  • [35] Selection of SNPs for evaluating gene-disease associations using haplotypes
    Li, N
    Li, M
    GENETIC EPIDEMIOLOGY, 2005, 29 (03) : 263 - 263
  • [36] Assessing Gene-Disease Relationship with Multifunctional Genes Using GO
    Al-Mubaid, Hisham
    Shenify, Mohamed
    Aljandali, Sultan
    2016 IEEE/ACS 13TH INTERNATIONAL CONFERENCE OF COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2016,
  • [37] Inferring Gene-Disease Association by an Integrative Analysis of eQTL Genome-Wide Association Study and Protein-Protein Interaction Data
    Wang, Jun
    Zheng, Jiashun
    Wang, Zengmiao
    Li, Hao
    Deng, Minghua
    HUMAN HEREDITY, 2017, 83 (03) : 117 - 129
  • [38] Inferring novel gene-disease associations using Medical Subject Heading Over-representation Profiles
    Cheung, Warren A.
    Ouellette, B. F. Francis
    Wasserman, Wyeth W.
    GENOME MEDICINE, 2012, 4
  • [39] Inferring novel gene-disease associations using Medical Subject Heading Over-representation Profiles
    Warren A Cheung
    BF Francis Ouellette
    Wyeth W Wasserman
    Genome Medicine, 4
  • [40] Novel therapeutic targets in myeloma bone disease
    Webb, S. L.
    Edwards, C. M.
    BRITISH JOURNAL OF PHARMACOLOGY, 2014, 171 (16) : 3765 - 3776