Comparison of Random Forest and Pipeline Pilot Naive Bayes in Prospective QSAR Predictions

被引:81
|
作者
Chen, Bin [2 ]
Sheridan, Robert P. [1 ]
Hornak, Viktor [1 ]
Voigt, Johannes H. [1 ]
机构
[1] Merck Res Labs, Chem Modeling & Informat Dept, Rahway, NJ 07065 USA
[2] Indiana Univ, Sch Informat & Comp, Bloomington, IN 47405 USA
关键词
COMPOUND CLASSIFICATION; MOLECULAR DESCRIPTOR; SIMILARITY; REGRESSION; MODELS; TOOL; SET;
D O I
10.1021/ci200615h
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
Random forest is currently considered one of the best QSAR methods available in terms of accuracy of prediction. However, it is computationally intensive. Naive Bayes is a simple, robust classification method. The Laplacian-modified Naive Bayes implementation is the preferred QSAR method in the widely used commercial chemoinformatics platform Pipeline Pilot. We made a comparison of the ability of Pipeline Pilot Naive Bayes (PLPNB) and random forest to make accurate predictions on 18 large, diverse in-house QSAR data sets. These include on-target and ADME-related activities. These data sets were set up as classification problems with either binary or multicategory activities. We used a time-split method of dividing training and test sets, as we feel this is a realistic way of simulating prospective prediction. PLPNB is computationally efficient. However, random forest predictions are at least as good and in many cases significantly better than those of PLPNB on our data sets. PLPNB performs better with ECFP4 and ECFP6 descriptors, which are native to Pipeline Pilot, and more poorly with other descriptors we tried.
引用
收藏
页码:792 / 803
页数:12
相关论文
共 44 条
  • [1] A Two-Layer Bayes Model: Random Forest Naive Bayes
    Zhang, Wenjun
    Jiang, Liangxiao
    Zhang, Huan
    Chen, Long
    [J]. Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2021, 58 (09): : 2040 - 2051
  • [2] Analysis And Comparison Of Prediction Of Heart Disease Using Novel Random Forest And Naive Bayes Algorithm
    Pavithraa, G.
    Sivaprasad
    [J]. CARDIOMETRY, 2022, (25): : 788 - 793
  • [3] Comparison of Naive Bayes, Support Vector Machine, Decision Trees and Random Forest on Sentiment Analysis
    Guia, Marcio
    Silva, Rodrigo Rocha
    Bernardino, Jorge
    [J]. KDIR: PROCEEDINGS OF THE 11TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE DISCOVERY, KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT - VOL 1: KDIR, 2019, : 525 - 531
  • [4] Algorithm Implementations Naive Bayes, Random Forest. C4.5 on Online Gaming for Learning Achievement Predictions
    Gata, Windu
    Basri, Hasan
    Hidayat, Rais
    Patras, Yuyun Elizabeth
    Baharuddin, Baharuddin
    Fatmasari, Rhini
    Tohari, Siswanto
    Wardhani, Nia Kusuma
    [J]. PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON RESEARCH OF EDUCATIONAL ADMINISTRATION AND MANAGEMENT (ICREAM 2018), 2018, 258 : 1 - 9
  • [5] The Feature selection and Comparison performance of Student's academic between Random Forest, Naive bayes and XGboost
    Thanarat, Preut
    Kiatjindarat, Waranyoo
    Jareanpon, Chatklaw
    [J]. 2023 IEEE INTERNATIONAL CONFERENCE ON TEACHING, ASSESSMENT AND LEARNING FOR ENGINEERING, TALE, 2023, : 636 - 641
  • [6] Automated prediction of Coronary Artery Disease using Random Forest and Naive Bayes
    Alotaibi, Sarah Saud
    Almajid, Yasmeen Ahmed
    Alsahali, Samar Fahad
    Asalam, Nida
    Alotaibi, Maha Dhawi
    Ullah, Irfan
    Altabee, Rahaf Mohammed
    [J]. ICACSIS 2020: 2020 12TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER SCIENCE AND INFORMATION SYSTEMS (ICACSIS), 2020, : 109 - 113
  • [7] FLOOD SUSCEPTIBILITY MAPPING AND ASSESSMENT USING REGULARIZED RANDOM FOREST AND NAIVE BAYES ALGORITHMS
    Habibi, A.
    Delavar, M. R.
    Sadeghian, M. S.
    Nazari, B.
    [J]. ISPRS GEOSPATIAL CONFERENCE 2022, JOINT 6TH SENSORS AND MODELS IN PHOTOGRAMMETRY AND REMOTE SENSING, SMPR/4TH GEOSPATIAL INFORMATION RESEARCH, GIRESEARCH CONFERENCES, VOL. 10-4, 2023, : 241 - 248
  • [8] IDENTIFYING FAKE NEWS ON TWITTER USING NAIVE BAYES, SVM AND RANDOM FOREST DISTRIBUTED ALGORITHMS
    Cusmaliuc, Ciprian-Gabriel
    Coca, Lucia-Georgiana
    Iftene, Adrian
    [J]. PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE LINGUISTIC RESOURCES AND TOOLS FOR PROCESSING THE ROMANIAN LANGUAGE, 2018, : 177 - 188
  • [9] Detecting Spam Emails/SMS Using Naive Bayes, Support Vector Machine and Random Forest
    Goswami, Vasudha
    Malviya, Vijay
    Sharma, Pratyush
    [J]. INNOVATIVE DATA COMMUNICATION TECHNOLOGIES AND APPLICATION, 2020, 46 : 608 - 615
  • [10] Carbonate Reservoir Rock Type Classification Using Comparison of Naive Bayes and Random Forest Method in Field "S" East Java']Java
    Rosid, M. S.
    Haikel, S.
    Haidar, M. W.
    [J]. PROCEEDINGS OF THE 4TH INTERNATIONAL SYMPOSIUM ON CURRENT PROGRESS IN MATHEMATICS AND SCIENCES (ISCPMS2018), 2019, 2168