Exploring the ability of machine learning-based virtual screening models to identify the functional groups responsible for binding

被引:2
|
作者
Hadfield, Thomas E. [1 ]
Scantlebury, Jack [1 ]
Deane, Charlotte M. [1 ]
机构
[1] Univ Oxford, Dept Stat, Oxford Prot Informat Grp, Oxford, England
关键词
Structure-based virtual screening; Machine learning; Interpretability; PROTEIN; DOCKING;
D O I
10.1186/s13321-023-00755-3
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Many recently proposed structure-based virtual screening models appear to be able to accurately distinguish high affinity binders from non-binders. However, several recent studies have shown that they often do so by exploiting ligand-specific biases in the dataset, rather than identifying favourable intermolecular interactions in the input protein-ligand complex. In this work we propose a novel approach for assessing the extent to which machine learning-based virtual screening models are able to identify the functional groups responsible for binding. To sidestep the difficulty in establishing the ground truth importance of each atom of a large scale set of protein-ligand complexes, we propose a protocol for generating synthetic data. Each ligand in the dataset is surrounded by a randomly sampled point cloud of pharmacophores, and the label assigned to the synthetic protein-ligand complex is determined by a 3-dimensional deterministic binding rule. This allows us to precisely quantify the ground truth importance of each atom and compare it to the model generated attributions. Using our generated datasets, we demonstrate that a recently proposed deep learning-based virtual screening model, PointVS, identified the most important functional groups with 39% more efficiency than a fingerprint-based random forest, suggesting that it would generalise more effectively to new examples. In addition, we found that ligand-specific biases, such as those present in widely used virtual screening datasets, substantially impaired the ability of all ML models to identify the most important functional groups. We have made our synthetic data generation framework available to facilitate the benchmarking of new virtual screening models. Code is available at https://github.com/tomhadfield95/synthVS.
引用
收藏
页数:15
相关论文
共 50 条
  • [41] Exploring Practical Vulnerabilities of Machine Learning-based Wireless Systems
    Liu, Zikun
    Xu, Changming
    Sie, Emerson
    Singh, Gagandeep
    Vasisht, Deepak
    PROCEEDINGS OF THE 20TH USENIX SYMPOSIUM ON NETWORKED SYSTEMS DESIGN AND IMPLEMENTATION, NSDI 2023, 2023, : 1801 - 1817
  • [42] A machine learning-based classification model to identify the effectiveness of vibration for μEDM
    Mollik, Md Shohag
    Saleh, Tanveer
    Nor, Khairul Affendy Bin Md
    Ali, Mohamed Sultan Mohamed
    ALEXANDRIA ENGINEERING JOURNAL, 2022, 61 (09) : 6979 - 6989
  • [43] A machine learning-based screening tool for genetic syndromes in children
    Mensah, Martin Atta
    Ott, Claus-Eric
    Horn, Denise
    Pantel, Jean Tori
    LANCET DIGITAL HEALTH, 2022, 4 (05): : E295 - E295
  • [44] Retail store location screening: A machine learning-based approach
    Lu, Jialiang
    Zheng, Xu
    Nervino, Esterina
    Li, Yanzhi
    Xu, Zhihua
    Xu, Yabo
    JOURNAL OF RETAILING AND CONSUMER SERVICES, 2024, 77
  • [45] Machine Learning-Based Toxicological Modeling for Screening Environmental Obesogens
    Wu, Siying
    Wang, Linping
    Schlenk, Daniel
    Liu, Jing
    ENVIRONMENTAL SCIENCE & TECHNOLOGY, 2024, 58 (41) : 18133 - 18144
  • [46] Credit scoring using machine learning and deep Learning-Based models
    Mestiri, Sami
    DATA SCIENCE IN FINANCE AND ECONOMICS, 2024, 4 (02): : 236 - 248
  • [47] COX-2 Inhibitor Prediction With KNIME: A Codeless Automated Machine Learning-Based Virtual Screening Workflow
    Ghosh, Powsali
    Kumar, Ashok
    Singh, Sushil Kumar
    JOURNAL OF COMPUTATIONAL CHEMISTRY, 2025, 46 (02)
  • [48] An integrated machine learning-based virtual screening strategy for biological weeding in maize field: a case study with HPPD
    Ajitha Antony
    Ramanathan Karuppasamy
    Journal of Plant Diseases and Protection, 2023, 130 : 1433 - 1449
  • [49] Machine Learning-Based Virtual Screening of Antibacterial Agents against Methicillin-Susceptible and Resistant Staphylococcus aureus
    Fernandes, Philipe Oliveira
    Dias, Anna LeticiaTeotonio
    dos Santos Junior, Valtair Severino
    Serafim, Mateus Sa Magalhaes
    Sousa, Yamara Viana
    Monteiro, Gustavo Claro
    Coutinho, Isabel Duarte
    Valli, Marilia
    Verzola, Marina Mol Sena Andrade
    Ottoni, Flaviano Melo
    de Padua, Rodrigo Maia
    Oda, Fernando Bombarda
    dos Santos, Andre Gonzaga
    Andricopulo, Adriano Defini
    Bolzani, Vanderlan da Silva
    Mota, Bruno Eduardo Fernandes
    Alves, Ricardo Jose
    de Oliveira, Renata Barbosa
    Kronenberger, Thales
    Maltarollo, Vinicius Goncalves
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2024, 64 (06) : 1932 - 1944
  • [50] MLViS: A Web Tool for Machine Learning-Based Virtual Screening in Early-Phase of Drug Discovery and Development
    Korkmaz, Selcuk
    Zararsiz, Gokmen
    Goksuluk, Dincer
    PLOS ONE, 2015, 10 (04):