Exploring the ability of machine learning-based virtual screening models to identify the functional groups responsible for binding

被引:2
|
作者
Hadfield, Thomas E. [1 ]
Scantlebury, Jack [1 ]
Deane, Charlotte M. [1 ]
机构
[1] Univ Oxford, Dept Stat, Oxford Prot Informat Grp, Oxford, England
关键词
Structure-based virtual screening; Machine learning; Interpretability; PROTEIN; DOCKING;
D O I
10.1186/s13321-023-00755-3
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Many recently proposed structure-based virtual screening models appear to be able to accurately distinguish high affinity binders from non-binders. However, several recent studies have shown that they often do so by exploiting ligand-specific biases in the dataset, rather than identifying favourable intermolecular interactions in the input protein-ligand complex. In this work we propose a novel approach for assessing the extent to which machine learning-based virtual screening models are able to identify the functional groups responsible for binding. To sidestep the difficulty in establishing the ground truth importance of each atom of a large scale set of protein-ligand complexes, we propose a protocol for generating synthetic data. Each ligand in the dataset is surrounded by a randomly sampled point cloud of pharmacophores, and the label assigned to the synthetic protein-ligand complex is determined by a 3-dimensional deterministic binding rule. This allows us to precisely quantify the ground truth importance of each atom and compare it to the model generated attributions. Using our generated datasets, we demonstrate that a recently proposed deep learning-based virtual screening model, PointVS, identified the most important functional groups with 39% more efficiency than a fingerprint-based random forest, suggesting that it would generalise more effectively to new examples. In addition, we found that ligand-specific biases, such as those present in widely used virtual screening datasets, substantially impaired the ability of all ML models to identify the most important functional groups. We have made our synthetic data generation framework available to facilitate the benchmarking of new virtual screening models. Code is available at https://github.com/tomhadfield95/synthVS.
引用
收藏
页数:15
相关论文
共 50 条
  • [21] FitScore: a fast machine learning-based score for 3D virtual screening enrichment
    Gehlhaar, Daniel K.
    Mermelstein, Daniel J.
    JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, 2024, 38 (01)
  • [22] A Machine Learning-Based Method to Identify Bipolar Disorder Patients
    Mateo-Sotos, J.
    Torres, A. M.
    Santos, J. L.
    Quevedo, O.
    Basar, C.
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2022, 41 (04) : 2244 - 2265
  • [23] A Machine Learning-Based Method to Identify Bipolar Disorder Patients
    J. Mateo-Sotos
    A. M. Torres
    J. L. Santos
    O. Quevedo
    C. Basar
    Circuits, Systems, and Signal Processing, 2022, 41 : 2244 - 2265
  • [24] Exploring Opportunities to Identify Abnormal Behavior of Data Center Users Based on Machine Learning Models
    I. V. Kotenko
    I. B. Saenko
    Pattern Recognition and Image Analysis, 2023, 33 : 368 - 372
  • [25] Exploring Opportunities to Identify Abnormal Behavior of Data Center Users Based on Machine Learning Models
    Kotenko, I. V.
    Saenko, I. B.
    PATTERN RECOGNITION AND IMAGE ANALYSIS, 2023, 33 (03) : 368 - 372
  • [26] Estimation of the applicability domain of kernel-based machine learning models for virtual screening
    Nikolas Fechner
    Andreas Jahn
    Georg Hinselmann
    Andreas Zell
    Journal of Cheminformatics, 2
  • [27] Estimation of the applicability domain of kernel-based machine learning models for virtual screening
    Fechner, Nikolas
    Jahn, Andreas
    Hinselmann, Georg
    Zell, Andreas
    JOURNAL OF CHEMINFORMATICS, 2010, 2
  • [28] Primary Hyperhidrosis and Sensitive Skin: Exploring the Link With Predictive Machine Learning-Based Classification Models
    McCormick, Erika T.
    Choi, Joung Min
    Azim, Sara Abdel
    Whiting, Cleo
    Pieretti, Lisa
    Zhang, Liqing
    Friedman, Adam
    JOURNAL OF DRUGS IN DERMATOLOGY, 2024, 23 (10) : 882 - 888
  • [29] Logging requirement for continuous auditing of responsible machine learning-based applications
    Patrick Loic Foalem
    Leuson Da Silva
    Foutse Khomh
    Heng Li
    Ettore Merlo
    Empirical Software Engineering, 2025, 30 (3)
  • [30] Identifying Novel Inhibitors for Hepatic Organic Anion Transporting Polypeptides by Machine Learning-Based Virtual Screening
    Tuerkova, Alzbeta
    Bongers, Brandon J.
    Norinder, Ulf
    Ungvari, Orsolya
    Szekely, Virag
    Tarnovskiy, Andrey
    Szakacs, Gergely
    Ozvegy-Laczka, Csilla
    van Westen, Gerard J. P.
    Zdrazil, Barbara
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2022, 62 (24) : 6323 - 6335