A Small Step Toward Generalizability: Training a Machine Learning Scoring Function for Structure-Based Virtual Screening

被引:13
|
作者
Scantlebury, Jack [1 ]
Vost, Lucy [1 ]
Carbery, Anna [1 ,2 ]
Hadfield, Thomas E. [1 ]
Turnbull, Oliver M. [1 ]
Brown, Nathan [3 ,4 ]
Chenthamarakshan, Vijil [5 ]
Das, Payel [5 ]
Grosjean, Harold [6 ]
von Delft, Frank [2 ,7 ,8 ,9 ]
Deane, Charlotte M. [1 ]
机构
[1] Univ Oxford, Dept Stat, Oxford OX1 2JD, England
[2] Diamond Light Source Ltd, Didcot OX11 0DE, England
[3] BenevolentAI, London W1T 5HD, England
[4] Healx, Charter House, Cambridge, England
[5] IBM Thomas J Watson Res Ctr, Yorktown Hts, NY 10598 USA
[6] Univ Oxford, Struct Genom Consortium, Oxford OX3 7DQ, England
[7] Univ Oxford, Ctr Med Discovery, Oxford OX3 7DQ, England
[8] Univ Johannesburg, Dept Biochem, ZA-2006 Johannesburg, South Africa
[9] Res Complex Harwell, Didcot OX11 0FA, England
基金
英国工程与自然科学研究理事会;
关键词
CONVOLUTIONAL NEURAL-NETWORK; INFORMATION;
D O I
10.1021/acs.jcim.3c00322
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
Over the past fewyears, many machine learning-based scoring functionsfor predicting the binding of small molecules to proteins have beendeveloped. Their objective is to approximate the distribution whichtakes two molecules as input and outputs the energy of their interaction.Only a scoring function that accounts for the interatomic interactionsinvolved in binding can accurately predict binding affinity on unseenmolecules. However, many scoring functions make predictions basedon data set biases rather than an understanding of the physics ofbinding. These scoring functions perform well when tested on similartargets to those in the training set but fail to generalize to dissimilartargets. To test what a machine learning-based scoring function haslearned, input attribution, a technique for learning which featuresare important to a model when making a prediction on a particulardata point, can be applied. If a model successfully learns somethingbeyond data set biases, attribution should give insight into the importantbinding interactions that are taking place. We built a machine learning-basedscoring function that aimed to avoid the influence of bias via thoroughtrain and test data set filtering and show that it achieves comparableperformance on the Comparative Assessment of Scoring Functions, 2016(CASF-2016) benchmark to other leading methods. We then use the CASF-2016test set to perform attribution and find that the bonds identifiedas important by PointVS, unlike those extracted from other scoringfunctions, have a high correlation with those found by a distance-basedinteraction profiler. We then show that attribution can be used toextract important binding pharmacophores from a given protein targetwhen supplied with a number of bound structures. We use this informationto perform fragment elaboration and see improvements in docking scorescompared to using structural information from a traditional, data-basedapproach. This not only provides definitive proof that the scoringfunction has learned to identify some important binding interactionsbut also constitutes the first deep learning-based method for extractingstructural information from a target for molecule design.
引用
收藏
页码:2960 / 2974
页数:15
相关论文
共 50 条
  • [1] Machine-learning scoring functions for structure-based virtual screening
    Li Hongjian
    Sze, Kam-Heung
    Lu Gang
    Ballester, Pedro J.
    [J]. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE, 2021, 11 (01)
  • [2] Performance of machine-learning scoring functions in structure-based virtual screening
    Wojcikowski, Maciej
    Ballester, Pedro J.
    Siedlecki, Pawel
    [J]. SCIENTIFIC REPORTS, 2017, 7
  • [3] A practical guide to machine-learning scoring for structure-based virtual screening
    Viet-Khoa Tran-Nguyen
    Muhammad Junaid
    Saw Simeon
    Pedro J. Ballester
    [J]. Nature Protocols, 2023, 18 : 3460 - 3511
  • [4] Performance of machine-learning scoring functions in structure-based virtual screening
    Maciej Wójcikowski
    Pedro J. Ballester
    Pawel Siedlecki
    [J]. Scientific Reports, 7
  • [5] A practical guide to machine-learning scoring for structure-based virtual screening
    Tran-Nguyen, Viet-Khoa
    Junaid, Muhammad
    Simeon, Saw
    Ballester, Pedro J.
    [J]. NATURE PROTOCOLS, 2023, 18 (11) : 3460 - 3511
  • [6] Beware of the generic machine learning-based scoring functions in structure-based virtual screening
    Shen, Chao
    Hu, Ye
    Wang, Zhe
    Zhang, Xujun
    Pang, Jinping
    Wang, Gaoang
    Zhong, Haiyang
    Xu, Lei
    Cao, Dongsheng
    Hou, Tingjun
    [J]. BRIEFINGS IN BIOINFORMATICS, 2021, 22 (03)
  • [7] Assessment of the Generalization Abilities of Machine-Learning Scoring Functions for Structure-Based Virtual Screening
    Zhu, Hui
    Yang, Jincai
    Huang, Niu
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2022, 62 (22) : 5485 - 5502
  • [8] Machine learning consensus scoring improves performance across targets in structure-based virtual screening
    Ericksen, Spencer
    Wu, Haozhen
    Zhang, Huikun
    Michael, Lauren
    Newton, Michael
    Hoffmann, F.
    Wildman, Scott
    [J]. ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2018, 255
  • [9] Machine Learning Consensus Scoring Improves Performance Across Targets in Structure-Based Virtual Screening
    Ericksen, Spencer S.
    Wu, Haozhen
    Zhang, Huikun
    Michael, Lauren A.
    Newton, Michael A.
    Hoffmann, F. Michael
    Wildman, Scott A.
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2017, 57 (07) : 1579 - 1590
  • [10] Improving structure-based virtual screening performance via learning from scoring function components
    Xiong, Guo-Li
    Ye, Wen-Ling
    Shen, Chao
    Lu, Ai-Ping
    Hou, Ting-Jun
    Cao, Dong-Sheng
    [J]. BRIEFINGS IN BIOINFORMATICS, 2021, 22 (03)