Can machine learning consistently improve the scoring power of classical scoring functions? Insights into the role of machine learning in scoring functions

被引：50

作者：

Shen, Chao ^{[1
]}

Hu, Ye

Wang, Zhe ^{[1
]}

Zhang, Xujun ^{[1
]}

Zhong, Haiyang ^{[1
,2
]}

Wang, Gaoang ^{[1
]}

Yao, Xiaojun ^{[2
,3
]}

Xu, Lei ^{[4
]}

Cao, Dongsheng ^{[6
]}

Hou, Tingjun ^{[5
]}

机构：

[1] Zhejiang Univ, Coll Pharmaceut Sci, Hangzhou, Peoples R China

[2] Lanzhou Univ, Coll Chem & Chem Engn, Lanzhou, Peoples R China

[3] Macau Univ Sci & Technol, Macau Inst Appl Res Med & Hlth, State Key Lab Qual Res Chinese Med, Macau, Peoples R China

[4] Jiangsu Univ Technol, Inst Bioinformat & Med Engn, Changzhou, Peoples R China

[5] Zhejiang Univ, Coll Pharmaceut Sci, Hangzhou Inst Innovat Med, Hangzhou 310058, Zhejiang, Peoples R China

[6] Cent South Univ, Xiangya Sch Pharmaceut Sci, Changsha 410013, Hunan, Peoples R China

来源：

BRIEFINGS IN BIOINFORMATICS | 2021年 / 22卷 / 01期

基金：

中国国家自然科学基金;

关键词：

scoring function (SF); machine learning (ML); scoring power; binding affinity; ML-based SF; BINDING-AFFINITY PREDICTION; PROTEIN-LIGAND INTERACTIONS; OUT CROSS-VALIDATION; RANDOM FOREST; DOCKING; APPROPRIATE; COMPLEXES; DISCOVERY; DATABASE; ACCURACY;

D O I：

10.1093/bib/bbz173

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

How to accurately estimate protein-ligand binding affinity remains a key challenge in computer-aided drug design (CADD). In many cases, it has been shown that the binding affinities predicted by classical scoring functions (SFs) cannot correlate well with experimentally measured biological activities. In the past few years, machine learning (ML)-based SFs have gradually emerged as potential alternatives and outperformed classical SFs in a series of studies. In this study, to better recognize the potential of classical SFs, we have conducted a comparative assessment of 25 commonly used SFs. Accordingly, the scoring power was systematically estimated by using the state-of-the-art ML methods that replaced the original multiple linear regression method to refit individual energy terms. The results show that the newly-developed ML-based SFs consistently performed better than classical ones. In particular, gradient boosting decision tree (GBDT) and random forest (RF) achieved the best predictions in most cases. The newly-developed ML-based SFs were also tested on another benchmark modified from PDBbind v2007, and the impacts of structural and sequence similarities were evaluated. The results indicated that the superiority of the ML-based SFs could be fully guaranteed when sufficient similar targets were contained in the training set. Moreover, the effect of the combinations of features from multiple SFs was explored, and the results indicated that combining NNscore2.0 with one to four other classical SFs could yield the best scoring power. However, it was not applicable to derive a generic target-specific SF or SF combination.

引用

页码：497 / 514

页数：18

共 50 条

[1] Ensemble machine learning to improve scoring functions
Wang, Xiang
[J]. ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2017, 254
[2] Influence of Data Similarity on the Scoring Power of Machine-learning Scoring Functions for Docking
Sze, Kam-Heung
Xiong, Zhiqiang
Ma, Jinlong
Lu, Gang
Chan, Wai-Yee
Li, Hongjian
[J]. PROCEEDINGS OF THE 13TH INTERNATIONAL JOINT CONFERENCE ON BIOMEDICAL ENGINEERING SYSTEMS AND TECHNOLOGIES, VOL 3: BIOINFORMATICS, 2020, : 85 - 92
[3] Delta Machine Learning to Improve Scoring-Ranking-Screening Performances of Protein-Ligand Scoring Functions
Yang, Chao
Zhang, Yingkai
[J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2022, 62 (11) : 2696 - 2712
[4] Feature Selection Investigation in Machine Learning Docking Scoring Functions
Balboni, Mauricio Dorneles Caldeira
Arrua, Oscar Emilio
Werhli, Adriano V.
Machado, Karina dos Santos
[J]. ADVANCES IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, BSB 2023, 2023, 13954 : 58 - 69
[5] Comparative assessment of machine-learning scoring functions on PDBbind 2013
Khamis, Mohamed A.
Gomaa, Walid
[J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2015, 45 : 136 - 151
[6] Machine Learning-Based Scoring Functions, Development and Applications with SAnDReS
Bitencourt-Ferreira, Gabriela
Rizzotto, Camila
de Azevedo Junior, Walter Filgueira
[J]. CURRENT MEDICINAL CHEMISTRY, 2021, 28 (09) : 1746 - 1756
[7] MetaScore: A Novel Machine-Learning-Based Approach to Improve Traditional Scoring Functions for Scoring Protein-Protein Docking Conformations
Jung, Yong
Geng, Cunliang
Bonvin, Alexandre M. J. J.
Xue, Li C.
Honavar, Vasant G.
[J]. BIOMOLECULES, 2023, 13 (01)
[8] Machine Learning Scoring Functions Based on Random Forest and Support Vector Regression
Ballester, Pedro J.
[J]. PATTERN RECOGNITION IN BIOINFORMATICS, 2012, 7632 : 14 - 25
[9] Machine-learning scoring functions for structure-based virtual screening
Li Hongjian
Sze, Kam-Heung
Lu Gang
Ballester, Pedro J.
[J]. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE, 2021, 11 (01)
[10] New machine learning and physics-based scoring functions for drug discovery
Isabella A. Guedes
André M. S. Barreto
Diogo Marinho
Eduardo Krempser
Mélaine A. Kuenemann
Olivier Sperandio
Laurent E. Dardenne
Maria A. Miteva
[J]. Scientific Reports, 11

← 1 2 3 4 5 →