In silico prediction of novel therapeutic targets using gene-disease association data

被引:66
|
作者
Ferrero, Enrico [1 ]
Dunham, Ian [2 ,3 ]
Sanseau, Philippe [1 ,3 ]
机构
[1] GSK Med Res Ctr, Computat Biol & Stats, Target Sci, Gunnels Wood Rd, Stevenage SG1 2NY, Herts, England
[2] EMBL, EBI, Wellcome Genome Campus, Cambridge CB10 1SD, England
[3] Open Targets, Wellcome Genome Campus, Cambridge CB10 1SD, England
关键词
Drug discovery; Target discovery; Gene-disease associations; Machine learning; Data mining; MATRIX METALLOPROTEINASES; DRUG-TARGETS; EXPRESSION; IDENTIFICATION; INTEGRATION; BROMODOMAIN; ALGORITHMS; SELECTION; SUPPORT; KV2.1;
D O I
10.1186/s12967-017-1285-6
中图分类号
R-3 [医学研究方法]; R3 [基础医学];
学科分类号
1001 ;
摘要
Background: Target identification and validation is a pressing challenge in the pharmaceutical industry, with many of the programmes that fail for efficacy reasons showing poor association between the drug target and the disease. Computational prediction of successful targets could have a considerable impact on attrition rates in the drug discovery pipeline by significantly reducing the initial search space. Here, we explore whether gene-disease association data from the Open Targets platform is sufficient to predict therapeutic targets that are actively being pursued by pharmaceutical companies or are already on the market. Methods: To test our hypothesis, we train four different classifiers (a random forest, a support vector machine, a neural network and a gradient boosting machine) on partially labelled data and evaluate their performance using nested cross-validation and testing on an independent set. We then select the best performing model and use it to make predictions on more than 15,000 genes. Finally, we validate our predictions by mining the scientific literature for proposed therapeutic targets. Results: We observe that the data types with the best predictive power are animal models showing a disease-relevant phenotype, differential expression in diseased tissue and genetic association with the disease under investigation. On a test set, the neural network classifier achieves over 71% accuracy with an AUC of 0.76 when predicting therapeutic targets in a semi-supervised learning setting. We use this model to gain insights into current and failed programmes and to predict 1431 novel targets, of which a highly significant proportion has been independently proposed in the literature. Conclusions: Our in silico approach shows that data linking genes and diseases is sufficient to predict novel therapeutic targets effectively and confirms that this type of evidence is essential for formulating or strengthening hypotheses in the target discovery process. Ultimately, more rapid and automated target prioritisation holds the potential to reduce both the costs and the development times associated with bringing new medicines to patients.
引用
收藏
页数:16
相关论文
共 50 条
  • [21] Using Genomics to Identify Novel Therapeutic Targets for Aortic Disease
    Raghavan, Avanthi
    Pirruccello, James P.
    Ellinor, Patrick T.
    Lindsay, Mark E.
    ARTERIOSCLEROSIS THROMBOSIS AND VASCULAR BIOLOGY, 2024, 44 (02) : 334 - 351
  • [22] Text Mining Task for "Gene-Disease" Association Semantics in CHIP 2022
    Ouyang, Sizhuo
    Yao, Xinzhi
    Wang, Yuxing
    Peng, Qianqian
    He, Zhihan
    Xia, Jingbo
    HEALTH INFORMATION PROCESSING. EVALUATION TRACK PAPERS, 2023, 1773 : 3 - 13
  • [23] Investigations of Gene-Disease Associations: Costs and Benefits of Environmental Data
    Luo, Hao
    Burstyn, Igor
    Gustafson, Paul
    EPIDEMIOLOGY, 2013, 24 (04) : 562 - 568
  • [24] Detecting novel therapeutic targets with in-silico homologs.
    McClatchy, S
    Elbrecht, A
    Bush, B
    Canaran, P
    Yuan, J
    Blevins, R
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2003, 226 : U299 - U299
  • [25] The research on gene-disease association based on text-mining of PubMed
    Zhou, Jie
    Fu, Bo-quan
    BMC BIOINFORMATICS, 2018, 19
  • [26] In silico prediction of cellular gene targets of herpesvirus encoded microRNAs
    Naqvi, Afsar R.
    Seal, Alexandra
    Shango, Jennifer
    Shukla, Deepak
    Nares, Salvador
    DATA IN BRIEF, 2018, 19 : 249 - 255
  • [27] The mining and construction of a knowledge base for gene-disease association in mitochondrial diseases
    Wang, Wei
    Song, Junying
    Chuai, Yunhai
    Chen, Fu
    Song, Chunlan
    Shu, Mingming
    Wang, Yayun
    Li, Yunfei
    Zhai, Xinyu
    Han, Shujie
    Yao, Shun
    Shen, Kexin
    Shang, Wei
    Zhang, Lei
    SCIENTIFIC REPORTS, 2021, 11 (01)
  • [28] The mining and construction of a knowledge base for gene-disease association in mitochondrial diseases
    Wei Wang
    Junying Song
    Yunhai Chuai
    Fu Chen
    Chunlan Song
    Mingming Shu
    Yayun Wang
    Yunfei Li
    Xinyu Zhai
    Shujie Han
    Shun Yao
    Kexin Shen
    Wei Shang
    Lei Zhang
    Scientific Reports, 11
  • [29] The research on gene-disease association based on text-mining of PubMed
    Jie Zhou
    Bo-quan Fu
    BMC Bioinformatics, 19
  • [30] Novel Therapeutic Targets for Cardiovascular Disease
    Yamagishi, Sho-ichi
    CURRENT PHARMACEUTICAL DESIGN, 2014, 20 (14) : 2346 - 2346