Can machine learning consistently improve the scoring power of classical scoring functions? Insights into the role of machine learning in scoring functions

被引:50
|
作者
Shen, Chao [1 ]
Hu, Ye
Wang, Zhe [1 ]
Zhang, Xujun [1 ]
Zhong, Haiyang [1 ,2 ]
Wang, Gaoang [1 ]
Yao, Xiaojun [2 ,3 ]
Xu, Lei [4 ]
Cao, Dongsheng [6 ]
Hou, Tingjun [5 ]
机构
[1] Zhejiang Univ, Coll Pharmaceut Sci, Hangzhou, Peoples R China
[2] Lanzhou Univ, Coll Chem & Chem Engn, Lanzhou, Peoples R China
[3] Macau Univ Sci & Technol, Macau Inst Appl Res Med & Hlth, State Key Lab Qual Res Chinese Med, Macau, Peoples R China
[4] Jiangsu Univ Technol, Inst Bioinformat & Med Engn, Changzhou, Peoples R China
[5] Zhejiang Univ, Coll Pharmaceut Sci, Hangzhou Inst Innovat Med, Hangzhou 310058, Zhejiang, Peoples R China
[6] Cent South Univ, Xiangya Sch Pharmaceut Sci, Changsha 410013, Hunan, Peoples R China
基金
中国国家自然科学基金;
关键词
scoring function (SF); machine learning (ML); scoring power; binding affinity; ML-based SF; BINDING-AFFINITY PREDICTION; PROTEIN-LIGAND INTERACTIONS; OUT CROSS-VALIDATION; RANDOM FOREST; DOCKING; APPROPRIATE; COMPLEXES; DISCOVERY; DATABASE; ACCURACY;
D O I
10.1093/bib/bbz173
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
How to accurately estimate protein-ligand binding affinity remains a key challenge in computer-aided drug design (CADD). In many cases, it has been shown that the binding affinities predicted by classical scoring functions (SFs) cannot correlate well with experimentally measured biological activities. In the past few years, machine learning (ML)-based SFs have gradually emerged as potential alternatives and outperformed classical SFs in a series of studies. In this study, to better recognize the potential of classical SFs, we have conducted a comparative assessment of 25 commonly used SFs. Accordingly, the scoring power was systematically estimated by using the state-of-the-art ML methods that replaced the original multiple linear regression method to refit individual energy terms. The results show that the newly-developed ML-based SFs consistently performed better than classical ones. In particular, gradient boosting decision tree (GBDT) and random forest (RF) achieved the best predictions in most cases. The newly-developed ML-based SFs were also tested on another benchmark modified from PDBbind v2007, and the impacts of structural and sequence similarities were evaluated. The results indicated that the superiority of the ML-based SFs could be fully guaranteed when sufficient similar targets were contained in the training set. Moreover, the effect of the combinations of features from multiple SFs was explored, and the results indicated that combining NNscore2.0 with one to four other classical SFs could yield the best scoring power. However, it was not applicable to derive a generic target-specific SF or SF combination.
引用
收藏
页码:497 / 514
页数:18
相关论文
共 50 条
  • [31] Machine-Learning- and Knowledge-Based Scoring Functions Incorporating Ligand and Protein Fingerprints
    Fujimoto, Kazuhiro J.
    Minami, Shota
    Yanai, Takeshi
    [J]. ACS OMEGA, 2022, 7 (22): : 19030 - 19039
  • [32] Beware of Machine Learning-Based Scoring Functions-On the Danger of Developing Black Boxes
    Gabel, Joffrey
    Desaphy, Jeremy
    Rognan, Didier
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2014, 54 (10) : 2807 - 2815
  • [33] Incorporating Explicit Water Molecules and Ligand Conformation Stability in Machine-Learning Scoring Functions
    Lu, Jianing
    Hou, Xuben
    Wang, Cheng
    Zhang, Yingkai
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2019, 59 (11) : 4540 - 4549
  • [34] Deep Learning and Machine Learning Techniques for Credit Scoring: A Review
    Wube, Hana Demma
    Esubalew, Sintayehu Zekarias
    Weldesellasie, Firesew Fayiso
    Debelee, Taye Girma
    [J]. PAN-AFRICAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, PT II, PANAFRICON AI 2023, 2024, 2069 : 30 - 61
  • [35] Machine Learning Scoring Functions for Drug Discovery from Experimental and Computer-Generated Protein-Ligand Structures: Towards Per-Target Scoring Functions
    Pellicani, Francesco
    Dal Ben, Diego
    Perali, Andrea
    Pilati, Sebastiano
    [J]. MOLECULES, 2023, 28 (04):
  • [36] Tapping on the Black Box: How Is the Scoring Power of a Machine-Learning Scoring Function Dependent on the Training Set?
    Su, Minyi
    Feng, Guoqin
    Liu, Zhihai
    Li, Yan
    Wang, Renxiao
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2020, 60 (03) : 1122 - 1136
  • [37] Machine-learning scoring functions trained on complexes dissimilar to the test set already outperform classical counterparts on a blind benchmark
    Li, Hongjian
    Lu, Gang
    Sze, Kam-Heung
    Su, Xianwei
    Chan, Wai-Yee
    Leung, Kwong-Sak
    [J]. BRIEFINGS IN BIOINFORMATICS, 2021, 22 (06)
  • [38] Review of Machine Learning models for Credit Scoring Analysis
    Kumar, Madapuri Rudra
    Gunjan, Vinit Kumar
    [J]. INGENIERIA SOLIDARIA, 2020, 16 (01):
  • [39] Goal scoring, coherent loss and applications to machine learning
    Yang, Wenzhuo
    Sim, Melvyn
    Xu, Huan
    [J]. MATHEMATICAL PROGRAMMING, 2020, 182 (1-2) : 103 - 140
  • [40] Machine learning and decision support system on credit scoring
    Teles, Gernmanno
    Rodrigues, Joel J. P. C.
    Saleem, Kashif
    Kozlov, Sergei
    Rabelo, Ricardo A. L.
    [J]. NEURAL COMPUTING & APPLICATIONS, 2020, 32 (14): : 9809 - 9826