TB-IECS: an accurate machine learning-based scoring function for virtual screening

被引:11
|
作者
Zhang, Xujun [1 ]
Shen, Chao [1 ]
Jiang, Dejun [1 ]
Zhang, Jintu [1 ]
Ye, Qing [1 ]
Xu, Lei [2 ]
Hou, Tingjun [1 ]
Pan, Peichen [1 ]
Kang, Yu [1 ]
机构
[1] Zhejiang Univ, Innovat Inst Artificial Intelligence Med, Coll Pharmaceut Sci, Hangzhou 310058, Zhejiang, Peoples R China
[2] Jiangsu Univ Technol, Inst Bioinformat & Med Engn, Sch Elect & Informat Engn, Changzhou 213001, Peoples R China
基金
中国国家自然科学基金;
关键词
Scoring function; Machine learning; Virtual screening; Theory-based interaction energy component; PROTEIN-LIGAND DOCKING; GENETIC ALGORITHM; BINDING-AFFINITY; FORCE-FIELD; VALIDATION; PREDICTION; GLIDE;
D O I
10.1186/s13321-023-00731-x
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Machine learning-based scoring functions (MLSFs) have shown potential for improving virtual screening capabilities over classical scoring functions (SFs). Due to the high computational cost in the process of feature generation, the numbers of descriptors used in MLSFs and the characterization of protein-ligand interactions are always limited, which may affect the overall accuracy and efficiency. Here, we propose a new SF called TB-IECS (theory-based interaction energy component score), which combines energy terms from Smina and NNScore version 2, and utilizes the eXtreme Gradient Boosting (XGBoost) algorithm for model training. In this study, the energy terms decomposed from 15 traditional SFs were firstly categorized based on their formulas and physicochemical principles, and 324 feature combinations were generated accordingly. Five best feature combinations were selected for further evaluation of the model performance in regard to the selection of feature vectors with various length, interaction types and ML algorithms. The virtual screening power of TB-IECS was assessed on the datasets of DUD-E and LIT-PCBA, as well as seven target-specific datasets from the ChemDiv database. The results showed that TB-IECS outperformed classical SFs including Glide SP and Dock, and effectively balanced the efficiency and accuracy for practical virtual screening.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] TB-IECS: an accurate machine learning-based scoring function for virtual screening
    Xujun Zhang
    Chao Shen
    Dejun Jiang
    Jintu Zhang
    Qing Ye
    Lei Xu
    Tingjun Hou
    Peichen Pan
    Yu Kang
    Journal of Cheminformatics, 15
  • [2] Beware of the generic machine learning-based scoring functions in structure-based virtual screening
    Shen, Chao
    Hu, Ye
    Wang, Zhe
    Zhang, Xujun
    Pang, Jinping
    Wang, Gaoang
    Zhong, Haiyang
    Xu, Lei
    Cao, Dongsheng
    Hou, Tingjun
    BRIEFINGS IN BIOINFORMATICS, 2021, 22 (03)
  • [3] A Global Machine Learning-Based Scoring Function For Protein Structure Prediction
    Kloczkowski, Andrzej
    Faraggi, Eshel
    PROTEIN SCIENCE, 2014, 23 : 244 - 244
  • [4] A Machine Learning-Based Approach for Virtual Network Function Modeling
    Mestres, Albert
    Alarcon, Eduard
    Cabellos, Albert
    2018 IEEE WIRELESS COMMUNICATIONS AND NETWORKING CONFERENCE WORKSHOPS (WCNCW), 2018, : 237 - 241
  • [5] MedusaScore: An accurate force field-based scoring function for virtual drug screening
    Yin, Shuangye
    Biedermannova, Lada
    Vondrasek, Jiri
    Dokholyan, Nikolay V.
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2008, 48 (08) : 1656 - 1662
  • [6] Machine-learning scoring functions for structure-based virtual screening
    Li Hongjian
    Sze, Kam-Heung
    Lu Gang
    Ballester, Pedro J.
    WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE, 2021, 11 (01)
  • [7] Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening
    Cang, Zixuan
    Mu, Lin
    Wei, Guo-Wei
    PLOS COMPUTATIONAL BIOLOGY, 2018, 14 (01)
  • [8] A Small Step Toward Generalizability: Training a Machine Learning Scoring Function for Structure-Based Virtual Screening
    Scantlebury, Jack
    Vost, Lucy
    Carbery, Anna
    Hadfield, Thomas E.
    Turnbull, Oliver M.
    Brown, Nathan
    Chenthamarakshan, Vijil
    Das, Payel
    Grosjean, Harold
    von Delft, Frank
    Deane, Charlotte M.
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2023, 63 (10) : 2960 - 2974
  • [9] Machine Learning-Based Virtual Screening for the Identification of Cdk5 Inhibitors
    Di Stefano, Miriana
    Galati, Salvatore
    Ortore, Gabriella
    Caligiuri, Isabella
    Rizzolio, Flavio
    Ceni, Costanza
    Bertini, Simone
    Bononi, Giulia
    Granchi, Carlotta
    Macchia, Marco
    Poli, Giulio
    Tuccinardi, Tiziano
    INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2022, 23 (18)
  • [10] The influence of negative training set size on machine learning-based virtual screening
    Rafał Kurczab
    Sabina Smusz
    Andrzej J Bojarski
    Journal of Cheminformatics, 6