A practical guide to machine-learning scoring for structure-based virtual screening

被引:0
|
作者
Viet-Khoa Tran-Nguyen
Muhammad Junaid
Saw Simeon
Pedro J. Ballester
机构
[1] Centre de Recherche en Cancérologie de Marseille,Department of Bioengineering
[2] Imperial College London,undefined
来源
Nature Protocols | 2023年 / 18卷
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Structure-based virtual screening (SBVS) via docking has been used to discover active molecules for a range of therapeutic targets. Chemical and protein data sets that contain integrated bioactivity information have increased both in number and in size. Artificial intelligence and, more concretely, its machine-learning (ML) branch, including deep learning, have effectively exploited these data sets to build scoring functions (SFs) for SBVS against targets with an atomic-resolution 3D model (e.g., generated by X-ray crystallography or predicted by AlphaFold2). Often outperforming their generic and non-ML counterparts, target-specific ML-based SFs represent the state of the art for SBVS. Here, we present a comprehensive and user-friendly protocol to build and rigorously evaluate these new SFs for SBVS. This protocol is organized into four sections: (i) using a public benchmark of a given target to evaluate an existing generic SF; (ii) preparing experimental data for a target from public repositories; (iii) partitioning data into a training set and a test set for subsequent target-specific ML modeling; and (iv) generating and evaluating target-specific ML SFs by using the prepared training-test partitions. All necessary code and input/output data related to three example targets (acetylcholinesterase, HMG-CoA reductase, and peroxisome proliferator-activated receptor-α) are available at https://github.com/vktrannguyen/MLSF-protocol, can be run by using a single computer within 1 week and make use of easily accessible software/programs (e.g., Smina, CNN-Score, RF-Score-VS and DeepCoy) and web resources. Our aim is to provide practical guidance on how to augment training data to enhance SBVS performance, how to identify the most suitable supervised learning algorithm for a data set, and how to build an SF with the highest likelihood of discovering target-active molecules within a given compound library.
引用
收藏
页码:3460 / 3511
页数:51
相关论文
共 50 条
  • [1] A practical guide to machine-learning scoring for structure-based virtual screening
    Tran-Nguyen, Viet-Khoa
    Junaid, Muhammad
    Simeon, Saw
    Ballester, Pedro J.
    [J]. NATURE PROTOCOLS, 2023, 18 (11) : 3460 - 3511
  • [2] Machine-learning scoring functions for structure-based virtual screening
    Li Hongjian
    Sze, Kam-Heung
    Lu Gang
    Ballester, Pedro J.
    [J]. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE, 2021, 11 (01)
  • [3] Performance of machine-learning scoring functions in structure-based virtual screening
    Maciej Wójcikowski
    Pedro J. Ballester
    Pawel Siedlecki
    [J]. Scientific Reports, 7
  • [4] Performance of machine-learning scoring functions in structure-based virtual screening
    Wojcikowski, Maciej
    Ballester, Pedro J.
    Siedlecki, Pawel
    [J]. SCIENTIFIC REPORTS, 2017, 7
  • [5] Assessment of the Generalization Abilities of Machine-Learning Scoring Functions for Structure-Based Virtual Screening
    Zhu, Hui
    Yang, Jincai
    Huang, Niu
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2022, 62 (22) : 5485 - 5502
  • [6] Machine-learning scoring functions to improve structure-based binding affinity prediction and virtual screening
    Ain, Qurrat Ul
    Aleksandrova, Antoniya
    Roessler, Florian D.
    Ballester, Pedro J.
    [J]. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE, 2015, 5 (06) : 405 - 424
  • [7] Beware of the generic machine learning-based scoring functions in structure-based virtual screening
    Shen, Chao
    Hu, Ye
    Wang, Zhe
    Zhang, Xujun
    Pang, Jinping
    Wang, Gaoang
    Zhong, Haiyang
    Xu, Lei
    Cao, Dongsheng
    Hou, Tingjun
    [J]. BRIEFINGS IN BIOINFORMATICS, 2021, 22 (03)
  • [8] Machine-learning scoring functions for structure-based drug lead optimization
    Li, Hongjian
    Sze, Kam-Heung
    Lu, Gang
    Ballester, Pedro J.
    [J]. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE, 2020, 10 (05)
  • [9] Machine Learning Consensus Scoring Improves Performance Across Targets in Structure-Based Virtual Screening
    Ericksen, Spencer S.
    Wu, Haozhen
    Zhang, Huikun
    Michael, Lauren A.
    Newton, Michael A.
    Hoffmann, F. Michael
    Wildman, Scott A.
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2017, 57 (07) : 1579 - 1590
  • [10] Machine learning consensus scoring improves performance across targets in structure-based virtual screening
    Ericksen, Spencer
    Wu, Haozhen
    Zhang, Huikun
    Michael, Lauren
    Newton, Michael
    Hoffmann, F.
    Wildman, Scott
    [J]. ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2018, 255