Large-Scale Pretraining Improves Sample Efficiency of Active Learning-Based Virtual Screening

被引:3
|
作者
Cao, Zhonglin [1 ]
Sciabola, Simone [1 ]
Wang, Ye [1 ]
机构
[1] Biogen, Med Chem, Cambridge, MA 02142 USA
关键词
MOLECULAR DOCKING; INHIBITOR; DISCOVERY; BINDING; GENERATION; DATABASE; ZINC;
D O I
10.1021/acs.jcim.3c01938
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
Virtual screening of large compound libraries to identify potential hit candidates is one of the earliest steps in drug discovery. As the size of commercially available compound collections grows exponentially to the scale of billions, active learning and Bayesian optimization have recently been proven as effective methods of narrowing down the search space. An essential component of those methods is a surrogate machine learning model that predicts the desired properties of compounds. An accurate model can achieve high sample efficiency by finding hits with only a fraction of the entire library being virtually screened. In this study, we examined the performance of a pretrained transformer-based language model and graph neural network in a Bayesian optimization active learning framework. The best pretrained model identifies 58.97% of the top-50,000 compounds after screening only 0.6% of an ultralarge library containing 99.5 million compounds, improving 8% over the previous state-of-the-art baseline. Through extensive benchmarks, we show that the superior performance of pretrained models persists in both structure-based and ligand-based drug discovery. Pretrained models can serve as a boost to the accuracy and sample efficiency of active learning-based virtual screening.
引用
收藏
页码:1882 / 1891
页数:10
相关论文
共 50 条
  • [1] DeepCPI:A Deep Learning-based Framework for Large-scale in silico Drug Screening
    Fangping Wan
    Yue Zhu
    Hailin Hu
    Antao Dai
    Xiaoqing Cai
    Ligong Chen
    Haipeng Gong
    Tian Xia
    Dehua Yang
    Ming-Wei Wang
    Jianyang Zeng
    Genomics,Proteomics & Bioinformatics, 2019, 17 (05) : 478 - 495
  • [2] DeepCPI:A Deep Learning-based Framework for Large-scale in silico Drug Screening
    Fangping Wan
    Yue Zhu
    Hailin Hu
    Antao Dai
    Xiaoqing Cai
    Ligong Chen
    Haipeng Gong
    Tian Xia
    Dehua Yang
    MingWei Wang
    Jianyang Zeng
    Genomics,Proteomics & Bioinformatics, 2019, (05) : 478 - 495
  • [3] DeepCPI: A Deep Learning-based Framework for Large-scale in silico Drug Screening
    Wan, Fangping
    Zhu, Yue
    Hu, Hailin
    Dai, Antao
    Cai, Xiaoqing
    Chen, Ligong
    Gong, Haipeng
    Xia, Tian
    Yang, Dehua
    Wang, Ming-Wei
    Zeng, Jianyang
    GENOMICS PROTEOMICS & BIOINFORMATICS, 2019, 17 (05) : 478 - 495
  • [4] Validation of Deep Learning-Based DFCNN in Extremely Large-Scale Virtual Screening and Application in Trypsin I Protease Inhibitor Discovery
    Zhang, Haiping
    Lin, Xiao
    Wei, Yanjie
    Zhang, Huiling
    Liao, Linbu
    Wu, Hao
    Pan, Yi
    Wu, Xuli
    FRONTIERS IN MOLECULAR BIOSCIENCES, 2022, 9
  • [5] AMAD: Active learning-based multivariate time series anomaly detection for large-scale IT systems
    Yu, Rongwei
    Wang, Yong
    Wang, Wang
    COMPUTERS & SECURITY, 2024, 137
  • [6] Deep Reinforcement Learning-Based Large-Scale Robot Exploration
    Cao, Yuhong
    Zhao, Rui
    Wang, Yizhuo
    Xiang, Bairan
    Sartoretti, Guillaume
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (05) : 4631 - 4638
  • [7] Machine Learning-Enabled Pipeline for Large-Scale Virtual Drug Screening
    Gupta, Aayush
    Zhou, Huan-Xiang
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2021, 61 (09) : 4236 - 4244
  • [8] Do Molecular Fingerprints Identify Diverse Active Drugs in Large-Scale Virtual Screening? (No)
    Venkatraman, Vishwesh
    Gaiser, Jeremiah
    Demekas, Daphne
    Roy, Amitava
    Xiong, Rui
    Wheeler, Travis J.
    PHARMACEUTICALS, 2024, 17 (08)
  • [9] Deep Learning-Based Large-Scale Automatic Satellite Crosswalk Classification
    Berriel, Rodrigo F.
    Lopes, Andre Teixeira
    de Souza, Alberto F.
    Oliveira-Santos, Thiago
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2017, 14 (09) : 1513 - 1517
  • [10] Cumulative learning-based competitive swarm optimizer for large-scale optimization
    Li, Wei
    Ni, Liangqilin
    Lei, Zhou
    Wang, Lei
    JOURNAL OF SUPERCOMPUTING, 2022, 78 (16): : 17619 - 17656