Large-Scale Pretraining Improves Sample Efficiency of Active Learning-Based Virtual Screening

被引:3
|
作者
Cao, Zhonglin [1 ]
Sciabola, Simone [1 ]
Wang, Ye [1 ]
机构
[1] Biogen, Med Chem, Cambridge, MA 02142 USA
关键词
MOLECULAR DOCKING; INHIBITOR; DISCOVERY; BINDING; GENERATION; DATABASE; ZINC;
D O I
10.1021/acs.jcim.3c01938
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
Virtual screening of large compound libraries to identify potential hit candidates is one of the earliest steps in drug discovery. As the size of commercially available compound collections grows exponentially to the scale of billions, active learning and Bayesian optimization have recently been proven as effective methods of narrowing down the search space. An essential component of those methods is a surrogate machine learning model that predicts the desired properties of compounds. An accurate model can achieve high sample efficiency by finding hits with only a fraction of the entire library being virtually screened. In this study, we examined the performance of a pretrained transformer-based language model and graph neural network in a Bayesian optimization active learning framework. The best pretrained model identifies 58.97% of the top-50,000 compounds after screening only 0.6% of an ultralarge library containing 99.5 million compounds, improving 8% over the previous state-of-the-art baseline. Through extensive benchmarks, we show that the superior performance of pretrained models persists in both structure-based and ligand-based drug discovery. Pretrained models can serve as a boost to the accuracy and sample efficiency of active learning-based virtual screening.
引用
收藏
页码:1882 / 1891
页数:10
相关论文
共 50 条
  • [31] Performance analysis and optimization of AMGA for the large-scale virtual screening
    Ahn, Sunil
    Kim, Namgyu
    Lee, Seehoon
    Nam, Dukyun
    Hwang, Soonwook
    Koblitz, Birger
    Breton, Vincent
    Han, Sangyong
    SOFTWARE-PRACTICE & EXPERIENCE, 2009, 39 (12): : 1055 - 1072
  • [32] Deep Learning with Geometry-Enhanced Molecular Representation for Augmentation of Large-Scale Docking-Based Virtual Screening
    Yu, Lan
    He, Xiao
    Fang, Xiaomin
    Liu, Lihang
    Liu, Jinfeng
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2023, 63 (21) : 6501 - 6514
  • [33] Systematic Investigation of Docking Failures in Large-Scale Structure-Based Virtual Screening
    Xu, Min
    Shen, Cheng
    Yang, Jincai
    Wang, Qing
    Huang, Niu
    ACS OMEGA, 2022, 7 (43): : 39417 - 39428
  • [34] Large-scale virtual screening experiments on Windows Azure-based cloud resources
    Kiss, Tamas
    Borsody, Peter
    Terstyanszky, Gabor
    Winter, Stephen
    Greenwell, Pamela
    McEldowney, Sharron
    Heindl, Hans
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2014, 26 (10): : 1760 - 1770
  • [35] Performance of active noise reduction in large-scale space based on virtual error sensing
    Dai H.
    Chen K.-A.
    Li R.
    Yu H.-X.
    Zhendong Gongcheng Xuebao/Journal of Vibration Engineering, 2024, 37 (04): : 677 - 685
  • [36] ALLIE: Active Learning on Large-scale Imbalanced Graphs
    Cui, Limeng
    Tang, Xianfeng
    Katariya, Sumeet
    Rao, Nikhil
    Agrawal, Pallav
    Subbian, Karthik
    Lee, Dongwon
    PROCEEDINGS OF THE ACM WEB CONFERENCE 2022 (WWW'22), 2022, : 690 - 698
  • [37] Large-Scale Image Classification Using Active Learning
    Alajlan, Naif
    Pasolli, Edoardo
    Melgani, Farid
    Franzoso, Andrea
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2014, 11 (01) : 259 - 263
  • [38] Learning-Based Pareto Optimal Control of Large-Scale Systems With Unknown Slow Dynamics
    Hesarkuchak, Saeed Tajik
    Boker, Almuatazbellah
    Reddy, Vasanth
    Mili, Lamine
    Eldardiry, Hoda
    IEEE CONTROL SYSTEMS LETTERS, 2024, 8 : 838 - 843
  • [39] Enhanced distributed learning-based coordination of multiple approximate MPC for large-scale systems
    Ren, Rui
    Li, Shaoyuan
    CHEMICAL ENGINEERING RESEARCH & DESIGN, 2025, 214 : 114 - 124
  • [40] Deep Learning-Based Symbol-Level Precoding for Large-Scale Antenna System
    Xie, Changxu
    Du, Huiqin
    Liu, Xialing
    WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2021, 2021