In silico prediction of novel therapeutic targets using gene-disease association data

被引:66
|
作者
Ferrero, Enrico [1 ]
Dunham, Ian [2 ,3 ]
Sanseau, Philippe [1 ,3 ]
机构
[1] GSK Med Res Ctr, Computat Biol & Stats, Target Sci, Gunnels Wood Rd, Stevenage SG1 2NY, Herts, England
[2] EMBL, EBI, Wellcome Genome Campus, Cambridge CB10 1SD, England
[3] Open Targets, Wellcome Genome Campus, Cambridge CB10 1SD, England
关键词
Drug discovery; Target discovery; Gene-disease associations; Machine learning; Data mining; MATRIX METALLOPROTEINASES; DRUG-TARGETS; EXPRESSION; IDENTIFICATION; INTEGRATION; BROMODOMAIN; ALGORITHMS; SELECTION; SUPPORT; KV2.1;
D O I
10.1186/s12967-017-1285-6
中图分类号
R-3 [医学研究方法]; R3 [基础医学];
学科分类号
1001 ;
摘要
Background: Target identification and validation is a pressing challenge in the pharmaceutical industry, with many of the programmes that fail for efficacy reasons showing poor association between the drug target and the disease. Computational prediction of successful targets could have a considerable impact on attrition rates in the drug discovery pipeline by significantly reducing the initial search space. Here, we explore whether gene-disease association data from the Open Targets platform is sufficient to predict therapeutic targets that are actively being pursued by pharmaceutical companies or are already on the market. Methods: To test our hypothesis, we train four different classifiers (a random forest, a support vector machine, a neural network and a gradient boosting machine) on partially labelled data and evaluate their performance using nested cross-validation and testing on an independent set. We then select the best performing model and use it to make predictions on more than 15,000 genes. Finally, we validate our predictions by mining the scientific literature for proposed therapeutic targets. Results: We observe that the data types with the best predictive power are animal models showing a disease-relevant phenotype, differential expression in diseased tissue and genetic association with the disease under investigation. On a test set, the neural network classifier achieves over 71% accuracy with an AUC of 0.76 when predicting therapeutic targets in a semi-supervised learning setting. We use this model to gain insights into current and failed programmes and to predict 1431 novel targets, of which a highly significant proportion has been independently proposed in the literature. Conclusions: Our in silico approach shows that data linking genes and diseases is sufficient to predict novel therapeutic targets effectively and confirms that this type of evidence is essential for formulating or strengthening hypotheses in the target discovery process. Ultimately, more rapid and automated target prioritisation holds the potential to reduce both the costs and the development times associated with bringing new medicines to patients.
引用
收藏
页数:16
相关论文
共 50 条
  • [11] Constrained Gaussian Process Regression for Gene-Disease Association
    Koyejo, Oluwasanmi
    Lee, Cheng
    Ghosh, Joydeep
    2013 IEEE 13TH INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW), 2013, : 72 - 79
  • [12] Assessing the gene-disease association of 19 genes with the RASopathies using the ClinGen gene curation framework
    Grant, Andrew R.
    Cushman, Brandon J.
    Cave, Helene
    Dillon, Mitchell W.
    Gelb, Bruce D.
    Gripp, Karen W.
    Lee, Jennifer A.
    Mason-Suares, Heather
    Rauen, Katherine A.
    Tartaglia, Marco
    Vincent, Lisa M.
    Zenker, Martin
    HUMAN MUTATION, 2018, 39 (11) : 1485 - 1493
  • [13] An Integrated Data Driven Approach to Drug Repositioning Using Gene-Disease Associations
    Mullen, Joseph
    Cockell, Simon J.
    Woollard, Peter
    Wipat, Anil
    PLOS ONE, 2016, 11 (05):
  • [14] Identification of highly related references about gene-disease association
    Rey-Long Liu
    Chia-Chun Shih
    BMC Bioinformatics, 15
  • [15] Gene-disease association through topological and biological feature integration
    Hanna, Eileen Marie
    Zaki, Nazar M.
    2015 11TH INTERNATIONAL CONFERENCE ON INNOVATIONS IN INFORMATION TECHNOLOGY (IIT), 2015, : 225 - 229
  • [16] Prediction and Validation of Gene-Disease Associations Using Methods Inspired by Social Network Analyses
    Singh-Blom, U. Martin
    Natarajan, Nagarajan
    Tewari, Ambuj
    Woods, John O.
    Dhillon, Inderjit S.
    Marcotte, Edward M.
    PLOS ONE, 2013, 8 (05):
  • [17] Identification of highly related references about gene-disease association
    Liu, Rey-Long
    Shih, Chia-Chun
    BMC BIOINFORMATICS, 2014, 15
  • [18] Two-stage designs for gene-disease association studies
    Satagopan, JM
    Verbel, DA
    Venkatraman, ES
    Offit, KE
    Begg, CB
    BIOMETRICS, 2002, 58 (01) : 163 - 170
  • [19] Using ClinGen standardized scoring system for assessment of gene-disease association in the clinical practice
    Zonic, Emir
    Ferreira, Mariana
    Ordonez-Herrera, Natalia
    Saravanakumar, Deepa
    Almeida, Ligia S.
    Fernandes, Ines C.
    Gulati, Nishtha
    Higgins, Rebecca
    Pereira, Catarina
    Paknia, Omid
    Bauer, Peter
    Basto, Jorge Pinto
    Bertoli-Avella, Aida
    EUROPEAN JOURNAL OF HUMAN GENETICS, 2023, 31 : 286 - 286
  • [20] GeDiPNet: Online resource of curated gene-disease associations for polypharmacological targets discovery
    Kundu, Indra
    Sharma, Mridula
    Barai, Ram Shankar
    Pokar, Khushal
    Idicula-Thomas, Susan
    GENES & DISEASES, 2023, 10 (03) : 647 - 649