A practical guide to machine-learning scoring for structure-based virtual screening

被引：0

作者：

Viet-Khoa Tran-Nguyen

Muhammad Junaid

Saw Simeon

Pedro J. Ballester

机构：

[1] Centre de Recherche en Cancérologie de Marseille,Department of Bioengineering

[2] Imperial College London,undefined

来源：

Nature Protocols | 2023年 / 18卷

关键词：

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Structure-based virtual screening (SBVS) via docking has been used to discover active molecules for a range of therapeutic targets. Chemical and protein data sets that contain integrated bioactivity information have increased both in number and in size. Artificial intelligence and, more concretely, its machine-learning (ML) branch, including deep learning, have effectively exploited these data sets to build scoring functions (SFs) for SBVS against targets with an atomic-resolution 3D model (e.g., generated by X-ray crystallography or predicted by AlphaFold2). Often outperforming their generic and non-ML counterparts, target-specific ML-based SFs represent the state of the art for SBVS. Here, we present a comprehensive and user-friendly protocol to build and rigorously evaluate these new SFs for SBVS. This protocol is organized into four sections: (i) using a public benchmark of a given target to evaluate an existing generic SF; (ii) preparing experimental data for a target from public repositories; (iii) partitioning data into a training set and a test set for subsequent target-specific ML modeling; and (iv) generating and evaluating target-specific ML SFs by using the prepared training-test partitions. All necessary code and input/output data related to three example targets (acetylcholinesterase, HMG-CoA reductase, and peroxisome proliferator-activated receptor-α) are available at https://github.com/vktrannguyen/MLSF-protocol, can be run by using a single computer within 1 week and make use of easily accessible software/programs (e.g., Smina, CNN-Score, RF-Score-VS and DeepCoy) and web resources. Our aim is to provide practical guidance on how to augment training data to enhance SBVS performance, how to identify the most suitable supervised learning algorithm for a data set, and how to build an SF with the highest likelihood of discovering target-active molecules within a given compound library.

引用

页码：3460 / 3511

页数：51

共 50 条

[1] A practical guide to machine-learning scoring for structure-based virtual screening
Tran-Nguyen, Viet-Khoa
Junaid, Muhammad
Simeon, Saw
Ballester, Pedro J.
[J]. NATURE PROTOCOLS, 2023, 18 (11) : 3460 - 3511
[2] Machine-learning scoring functions for structure-based virtual screening
Li Hongjian
Sze, Kam-Heung
Lu Gang
Ballester, Pedro J.
[J]. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE, 2021, 11 (01)
[3] Performance of machine-learning scoring functions in structure-based virtual screening
Maciej Wójcikowski
Pedro J. Ballester
Pawel Siedlecki
[J]. Scientific Reports, 7
[4] Performance of machine-learning scoring functions in structure-based virtual screening
Wojcikowski, Maciej
Ballester, Pedro J.
Siedlecki, Pawel
[J]. SCIENTIFIC REPORTS, 2017, 7
[5] Assessment of the Generalization Abilities of Machine-Learning Scoring Functions for Structure-Based Virtual Screening
Zhu, Hui
Yang, Jincai
Huang, Niu
[J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2022, 62 (22) : 5485 - 5502
[6] Machine-learning scoring functions to improve structure-based binding affinity prediction and virtual screening
Ain, Qurrat Ul
Aleksandrova, Antoniya
Roessler, Florian D.
Ballester, Pedro J.
[J]. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE, 2015, 5 (06) : 405 - 424
[7] Beware of the generic machine learning-based scoring functions in structure-based virtual screening
Shen, Chao
Hu, Ye
Wang, Zhe
Zhang, Xujun
Pang, Jinping
Wang, Gaoang
Zhong, Haiyang
Xu, Lei
Cao, Dongsheng
Hou, Tingjun
[J]. BRIEFINGS IN BIOINFORMATICS, 2021, 22 (03)
[8] Machine-learning scoring functions for structure-based drug lead optimization
Li, Hongjian
Sze, Kam-Heung
Lu, Gang
Ballester, Pedro J.
[J]. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE, 2020, 10 (05)
[9] Machine Learning Consensus Scoring Improves Performance Across Targets in Structure-Based Virtual Screening
Ericksen, Spencer S.
Wu, Haozhen
Zhang, Huikun
Michael, Lauren A.
Newton, Michael A.
Hoffmann, F. Michael
Wildman, Scott A.
[J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2017, 57 (07) : 1579 - 1590
[10] Machine learning consensus scoring improves performance across targets in structure-based virtual screening
Ericksen, Spencer
Wu, Haozhen
Zhang, Huikun
Michael, Lauren
Newton, Michael
Hoffmann, F.
Wildman, Scott
[J]. ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2018, 255

← 1 2 3 4 5 →