Predicting the Accuracy of Ligand Overlay Methods with Random Forest Models

被引:5
|
作者
Nandigam, Ravi K. [2 ]
Evans, David A. [3 ]
Erickson, Jon A. [4 ]
Kim, Sangtae [2 ]
Sutherland, Jeffrey J. [1 ]
机构
[1] Eli Lilly & Co, Discovery Informat, Indianapolis, IN 46285 USA
[2] Purdue Univ, Sch Chem Engn, W Lafayette, IN 47907 USA
[3] Lilly Res Ctr, Surrey, England
[4] Lilly Res Labs, Indianapolis, IN USA
关键词
D O I
10.1021/ci800216f
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
The accuracy of binding, mode prediction using standard molecular overlay methods (ROCS, FlexS, Phase, and FieldCompare) is studied. Previous work has shown that simple decision tree modeling can be used to improve accuracy by selection of the best overlay template. This concept is extended to the use of Random Forest (RF) modeling for template and algorithm selection. An extensive data set of 815 ligand-bound X-ray structures representing 5 gene families was used for generating ca. 70,000 overlays using four programs. RF models, trained using standard measures of ligand and protein similarity and Lipinski-related descriptors, are used for automatically selecting the reference ligand and overlay method maximizing the probability of reproducing the overlay deduced from X-ray structures (i.e., using rmsd <= 2 angstrom as the criteria for success). RF model scores are highly predictive of overlay accuracy, and their use in template and method selection produces correct overlays in 57% of cases for 349 overlay ligands not used for training RF models. The inclusion in the models of protein sequence similarity enables the use of templates bound to related protein structures, yielding useful results even for proteins having no available X-ray structures.
引用
收藏
页码:2386 / 2394
页数:9
相关论文
共 50 条
  • [1] RANDOM FOREST MODELS FOR PREDICTING SURVIVAL AFTER OESOPHAGECTOMY
    Rahman, S. A.
    Walker, R. C.
    Crosby, T.
    Maynard, N.
    Cromwell, D. A.
    Underwood, T. J.
    [J]. BRITISH JOURNAL OF SURGERY, 2021, 108
  • [2] Comparing the Accuracy and Developed Models for Predicting the Confrontation Naming of the Elderly in South Korea using Weighted Random Forest, Random Forest, and Support Vector Regression
    Byeon, Haewon
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (02) : 326 - 331
  • [3] Predicting the Accuracy of Protein-Ligand Docking on Homology Models
    Bordogna, Annalisa
    Pandini, Alessandro
    Bonati, Laura
    [J]. JOURNAL OF COMPUTATIONAL CHEMISTRY, 2011, 32 (01) : 81 - 98
  • [4] METHODS MEASURING ACCURACY OF MODELS PREDICTING FINANCIAL DISTRESS
    Camska, Dagmar
    [J]. HRADECKE EKONOMICKE DNY, ROC. 5 (1), 2015, 5 : 140 - 146
  • [5] Interpretation of QSAR Models Based on Random Forest Methods
    Kuz'min, Victor E.
    Polishchuk, Pavel G.
    Artemenko, Anatoly G.
    Andronati, Sergey A.
    [J]. MOLECULAR INFORMATICS, 2011, 30 (6-7) : 593 - 603
  • [6] Seeing the Forest for the Trees: Random Forest Models for Predicting Survival in Kidney Transplant Recipients
    Sapir-Pichhadze, Ruth
    Kaplan, Bruce
    [J]. TRANSPLANTATION, 2020, 104 (05) : 905 - 906
  • [7] A Comparison of Logistic Regression, Random Forest Models in Predicting the Risk of Diabetes
    Zhang, Baoxin
    Lu, Li
    Hou, Jiaqi
    [J]. THIRD INTERNATIONAL SYMPOSIUM ON IMAGE COMPUTING AND DIGITAL MEDICINE (ISICDM 2019), 2019, : 231 - 234
  • [8] Predicting residue-residue contacts using random forest models
    Li, Yunqi
    Fang, Yaping
    Fang, Jianwen
    [J]. BIOINFORMATICS, 2011, 27 (24) : 3379 - 3384
  • [9] PREDICTING HLA SEROLOGIC SPECIFICITIES WITH RANDOM FOREST MACHINE LEARNING MODELS
    Biagini, D. G.
    Gragert, L.
    Maiers, M.
    [J]. HUMAN IMMUNOLOGY, 2021, 82 : 185 - 185
  • [10] Predicting sulfotyrosine sites using the random forest algorithm with significantly improved prediction accuracy
    Zheng Rong Yang
    [J]. BMC Bioinformatics, 10