Support vector machine regression (LS-SVM)-an alternative to artificial neural networks (ANNs) for the analysis of quantum chemistry data?

被引:169
|
作者
Balabin, Roman M. [1 ]
Lomakina, Ekaterina I. [2 ]
机构
[1] ETH, Dept Chem & Appl Biosci, CH-8093 Zurich, Switzerland
[2] ETH, Dept Comp Sci, CH-8093 Zurich, Switzerland
关键词
NEAR-INFRARED SPECTROSCOPY; COMBINED 1ST-PRINCIPLES CALCULATION; ALKANES RAMAN-SPECTROSCOPY; POTENTIAL-ENERGY SURFACES; DENSITY-FUNCTIONAL THEORY; NIR SPECTROSCOPY; N-PENTANE; GASOLINE CLASSIFICATION; ENTHALPY DIFFERENCE; BASE STOCK;
D O I
10.1039/c1cp00051a
中图分类号
O64 [物理化学(理论化学)、化学物理学];
学科分类号
070304 ; 081704 ;
摘要
A multilayer feed-forward artificial neural network (MLP-ANN) with a single, hidden layer that contains a finite number of neurons can be regarded as a universal non-linear approximator. Today, the ANN method and linear regression (MLR) model are widely used for quantum chemistry (QC) data analysis (e. g., thermochemistry) to improve their accuracy (e. g., Gaussian G2-G4, B3LYP/B3-LYP, X1, or W1 theoretical methods). In this study, an alternative approach based on support vector machines (SVMs) is used, the least squares support vector machine (LS-SVM) regression. It has been applied to ab initio (first principle) and density functional theory (DFT) quantum chemistry data. So, QC + SVM methodology is an alternative to QC + ANN one. The task of the study was to estimate the Moller-Plesset (MPn) or DFT (B3LYP, BLYP, BMK) energies calculated with large basis sets (e. g., 6-311G(3df, 3pd)) using smaller ones (6-311G, 6-311G*, 6-311G**) plus molecular descriptors. A molecular set (BRM-208) containing a total of 208 organic molecules was constructed and used for the LS-SVM training, cross-validation, and testing. MP2, MP3, MP4(DQ), MP4(SDQ), and MP4/MP4(SDTQ) ab initio methods were tested. Hartree-Fock (HF/SCF) results were also reported for comparison. Furthermore, constitutional (CD: total number of atoms and mole fractions of different atoms) and quantum-chemical (QD: HOMO-LUMO gap, dipole moment, average polarizability, and quadrupole moment) molecular descriptors were used for the building of the LS-SVM calibration model. Prediction accuracies (MADs) of 1.62 perpendicular to 0.51 and 0.85 +/- 0.24 kcal mol(-1) (1 kcal mol(-1) = 4.184 kJ mol(-1)) were reached for SVM-based approximations of ab initio and DFT energies, respectively. The LS-SVM model was more accurate than the MLR model. A comparison with the artificial neural network approach shows that the accuracy of the LS-SVM method is similar to the accuracy of ANN. The extrapolation and interpolation results show that LS-SVM is superior by almost an order of magnitude over the ANN method in terms of the stability, generality, and robustness of the final model. The LS-SVM model needs a much smaller numbers of samples (a much smaller sample set) to make accurate prediction results. Potential energy surface (PES) approximations for molecular dynamics (MD) studies are discussed as a promising application for the LS-SVM calibration approach.
引用
收藏
页码:11710 / 11718
页数:9
相关论文
共 50 条
  • [1] Support vector machine regression (SVR/LS-SVM)-an alternative to neural networks (ANN) for analytical chemistry? Comparison of nonlinear methods on near infrared (NIR) spectroscopy data
    Balabin, Roman M.
    Lomakina, Ekaterina I.
    ANALYST, 2011, 136 (08) : 1703 - 1712
  • [2] Prediction of water quality index (WQI) using support vector machine (SVM) and least square-support vector machine (LS-SVM)
    Leong, Wei Cong
    Bahadori, Alireza
    Zhang, Jie
    Ahmad, Z.
    INTERNATIONAL JOURNAL OF RIVER BASIN MANAGEMENT, 2021, 19 (02) : 149 - 156
  • [3] BROMATE REMOVAL PREDICTION IN DRINKING WATER BY USING THE LEAST SQUARES SUPPORT VECTOR MACHINE (LS-SVM)
    Karadurnius, Erdal
    Goz, Eda
    Taskin, Nur
    Yuceer, Mehmet
    SIGMA JOURNAL OF ENGINEERING AND NATURAL SCIENCES-SIGMA MUHENDISLIK VE FEN BILIMLERI DERGISI, 2020, 38 (04): : 2145 - 2153
  • [4] Prediction of the properties of brines using least squares support vector machine (LS-SVM) computational strategy
    Arabloo, Milad
    Ziaee, Hossein
    Lee, Moonyong
    Bahadori, Alireza
    JOURNAL OF THE TAIWAN INSTITUTE OF CHEMICAL ENGINEERS, 2015, 50 : 123 - 130
  • [5] Comparison Between Wind Power Prediction Models Based on Wavelet Decomposition with Least-Squares Support Vector Machine (LS-SVM) and Artificial Neural Network (ANN)
    De Giorgi, Maria Grazia
    Campilongo, Stefano
    Ficarella, Antonio
    Congedo, Paolo Maria
    ENERGIES, 2014, 7 (08) : 5251 - 5272
  • [7] New ridge regression, artificial neural networks and support vector machine for wind speed prediction
    Zheng, Yun
    Ge, Yisu
    Muhsen, Sami
    Wang, Shifeng
    Elkamchouchi, Dalia H.
    Ali, Elimam
    Ali, H. Elhosiny
    ADVANCES IN ENGINEERING SOFTWARE, 2023, 179
  • [8] Region of interest based prostate tissue characterization using least square support vector machine LS-SVM
    Mohamed, SS
    Salama, MMA
    Kamel, M
    Rizkalla, K
    IMAGE ANALYSIS AND RECOGNITION, PT 2, PROCEEDINGS, 2004, 3212 : 51 - 58
  • [9] Statistical experimental design, least squares-support vector machine (LS-SVM) and artificial neural network (ANN) methods for modeling the facilitated adsorption of methylene blue dye
    Asfaram, A.
    Ghaedi, M.
    Azqhandi, M. H. Ahmadi
    Goudarzi, A.
    Dastkhoon, M.
    RSC ADVANCES, 2016, 6 (46) : 40502 - 40516
  • [10] Assessment of the effects of training data selection on the landslide susceptibility mapping: a comparison between support vector machine (SVM), logistic regression (LR) and artificial neural networks (ANN)
    Kalantar, Bahareh
    Pradhan, Biswajeet
    Naghibi, Seyed Amir
    Motevalli, Alireza
    Mansor, Shattri
    GEOMATICS NATURAL HAZARDS & RISK, 2018, 9 (01) : 49 - 69