Large scale comparison of QSAR and conformal prediction methods and their applications in drug discovery

被引:104
|
作者
Bosc, Nicolas [1 ]
Atkinson, Francis [1 ]
Felix, Eloy [1 ]
Gaulton, Anna [1 ]
Hersey, Anne [1 ]
Leach, Andrew R. [1 ]
机构
[1] European Bioinformat Inst EMBL EBI, Chemogen Team, Wellcome Genome Campus, Cambridge CB10 1SD, England
基金
英国惠康基金; 欧盟地平线“2020”;
关键词
QSAR; Mondrian conformal prediction; ChEMBL; Classification models; Cheminformatics; APPLICABILITY DOMAIN; CLASSIFICATION; DATABASE; CHEMICALS; DESIGN;
D O I
10.1186/s13321-018-0325-4
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Structure-activity relationship modelling is frequently used in the early stage of drug discovery to assess the activity of a compound on one or several targets, and can also be used to assess the interaction of compounds with liability targets. QSAR models have been used for these and related applications over many years, with good success. Conformal prediction is a relatively new QSAR approach that provides information on the certainty of a prediction, and so helps in decision-making. However, it is not always clear how best to make use of this additional information. In this article, we describe a case study that directly compares conformal prediction with traditional QSAR methods for large-scale predictions of target-ligand binding. The ChEMBL database was used to extract a data set comprising data from 550 human protein targets with different bioactivity profiles. For each target, a QSAR model and a conformal predictor were trained and their results compared. The models were then evaluated on new data published since the original models were built to simulate a real world application. The comparative study highlights the similarities between the two techniques but also some differences that it is important to bear in mind when the methods are used in practical drug discovery applications.
引用
收藏
页数:16
相关论文
共 50 条
  • [31] Coreset-based Conformal Prediction for Large-scale Learning
    Riquelme-Granada, Nery
    Khuong An Nguyen
    Luo, Zhiyuan
    CONFORMAL AND PROBABILISTIC PREDICTION AND APPLICATIONS, VOL 105, 2019, 105
  • [32] Conformal Prediction in Spark: Large-Scale Machine Learning with Confidence
    Capuccini, Marco
    Carlsson, Lars
    Norinder, Ulf
    Spjuth, Ola
    2015 IEEE/ACM 2ND INTERNATIONAL SYMPOSIUM ON BIG DATA COMPUTING (BDC), 2015, : 61 - 67
  • [33] Large-scale comparison of machine learning methods for profiling prediction of kinase inhibitors
    Jiangxia Wu
    Yihao Chen
    Jingxing Wu
    Duancheng Zhao
    Jindi Huang
    MuJie Lin
    Ling Wang
    Journal of Cheminformatics, 16
  • [34] Large-scale comparison of machine learning methods for profiling prediction of kinase inhibitors
    Wu, Jiangxia
    Chen, Yihao
    Wu, Jingxing
    Zhao, Duancheng
    Huang, Jindi
    Lin, Mujie
    Wang, Ling
    JOURNAL OF CHEMINFORMATICS, 2024, 16 (01)
  • [35] Provenance Comparison for Large-Scale Knowledge Discovery
    Zhao, Xiang
    Ge, Bin
    Tang, Jiuyang
    Xiao, Weidong
    Shang, Haichuan
    2013 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2013,
  • [36] Strategy for large-scale isolation of enantiomers in drug discovery
    Leek, Hanna
    Thunberg, Linda
    Jonson, Anna C.
    Ohlen, Kristina
    Klarqvist, Magnus
    DRUG DISCOVERY TODAY, 2017, 22 (01) : 133 - 139
  • [37] Large-scale Direct Targeting for Drug Repositioning and Discovery
    Zheng, Chunli
    Guo, Zihu
    Huang, Chao
    Wu, Ziyin
    Li, Yan
    Chen, Xuetong
    Fu, Yingxue
    Ru, Jinlong
    Shar, Piar Ali
    Wang, Yuan
    Wang, Yonghua
    SCIENTIFIC REPORTS, 2015, 5
  • [38] Deep learning large-scale drug discovery and repurposing
    Yu, Min
    Li, Weiming
    Yu, Yunru
    Zhao, Yu
    Xiao, Lizhi
    Lauschke, Volker M.
    Cheng, Yiyu
    Zhang, Xingcai
    Wang, Yi
    NATURE COMPUTATIONAL SCIENCE, 2024, 4 (08): : 600 - 614
  • [39] ChEMBL: a large-scale bioactivity database for drug discovery
    Gaulton, Anna
    Bellis, Louisa J.
    Bento, A. Patricia
    Chambers, Jon
    Davies, Mark
    Hersey, Anne
    Light, Yvonne
    McGlinchey, Shaun
    Michalovich, David
    Al-Lazikani, Bissan
    Overington, John P.
    NUCLEIC ACIDS RESEARCH, 2012, 40 (D1) : D1100 - D1107
  • [40] Large-scale integrated databases supporting drug discovery
    Roter, AH
    CURRENT OPINION IN DRUG DISCOVERY & DEVELOPMENT, 2005, 8 (03) : 309 - 315