Support vector machine regression (LS-SVM)-an alternative to artificial neural networks (ANNs) for the analysis of quantum chemistry data?

被引:169
|
作者
Balabin, Roman M. [1 ]
Lomakina, Ekaterina I. [2 ]
机构
[1] ETH, Dept Chem & Appl Biosci, CH-8093 Zurich, Switzerland
[2] ETH, Dept Comp Sci, CH-8093 Zurich, Switzerland
关键词
NEAR-INFRARED SPECTROSCOPY; COMBINED 1ST-PRINCIPLES CALCULATION; ALKANES RAMAN-SPECTROSCOPY; POTENTIAL-ENERGY SURFACES; DENSITY-FUNCTIONAL THEORY; NIR SPECTROSCOPY; N-PENTANE; GASOLINE CLASSIFICATION; ENTHALPY DIFFERENCE; BASE STOCK;
D O I
10.1039/c1cp00051a
中图分类号
O64 [物理化学(理论化学)、化学物理学];
学科分类号
070304 ; 081704 ;
摘要
A multilayer feed-forward artificial neural network (MLP-ANN) with a single, hidden layer that contains a finite number of neurons can be regarded as a universal non-linear approximator. Today, the ANN method and linear regression (MLR) model are widely used for quantum chemistry (QC) data analysis (e. g., thermochemistry) to improve their accuracy (e. g., Gaussian G2-G4, B3LYP/B3-LYP, X1, or W1 theoretical methods). In this study, an alternative approach based on support vector machines (SVMs) is used, the least squares support vector machine (LS-SVM) regression. It has been applied to ab initio (first principle) and density functional theory (DFT) quantum chemistry data. So, QC + SVM methodology is an alternative to QC + ANN one. The task of the study was to estimate the Moller-Plesset (MPn) or DFT (B3LYP, BLYP, BMK) energies calculated with large basis sets (e. g., 6-311G(3df, 3pd)) using smaller ones (6-311G, 6-311G*, 6-311G**) plus molecular descriptors. A molecular set (BRM-208) containing a total of 208 organic molecules was constructed and used for the LS-SVM training, cross-validation, and testing. MP2, MP3, MP4(DQ), MP4(SDQ), and MP4/MP4(SDTQ) ab initio methods were tested. Hartree-Fock (HF/SCF) results were also reported for comparison. Furthermore, constitutional (CD: total number of atoms and mole fractions of different atoms) and quantum-chemical (QD: HOMO-LUMO gap, dipole moment, average polarizability, and quadrupole moment) molecular descriptors were used for the building of the LS-SVM calibration model. Prediction accuracies (MADs) of 1.62 perpendicular to 0.51 and 0.85 +/- 0.24 kcal mol(-1) (1 kcal mol(-1) = 4.184 kJ mol(-1)) were reached for SVM-based approximations of ab initio and DFT energies, respectively. The LS-SVM model was more accurate than the MLR model. A comparison with the artificial neural network approach shows that the accuracy of the LS-SVM method is similar to the accuracy of ANN. The extrapolation and interpolation results show that LS-SVM is superior by almost an order of magnitude over the ANN method in terms of the stability, generality, and robustness of the final model. The LS-SVM model needs a much smaller numbers of samples (a much smaller sample set) to make accurate prediction results. Potential energy surface (PES) approximations for molecular dynamics (MD) studies are discussed as a promising application for the LS-SVM calibration approach.
引用
收藏
页码:11710 / 11718
页数:9
相关论文
共 50 条
  • [41] Application of artificial neural networks and support vector regression modeling in prediction of magnetorheological fluid rheometery
    Rabbani, Y.
    Shirvani, M.
    Hashemabadi, S. H.
    Keshavarz, M.
    COLLOIDS AND SURFACES A-PHYSICOCHEMICAL AND ENGINEERING ASPECTS, 2017, 520 : 268 - 278
  • [42] Optimisation of turning parameters by integrating genetic algorithm with support vector regression and artificial neural networks
    Gupta, Amit Kumar
    Guntuku, Sharath Chandra
    Desu, Raghuram Karthik
    Balu, Aditya
    INTERNATIONAL JOURNAL OF ADVANCED MANUFACTURING TECHNOLOGY, 2015, 77 (1-4): : 331 - 339
  • [43] Performance assessment of artificial neural networks and support vector regression models for stream flow predictions
    Ateeq-ur-Rauf
    Ghumman, Abdul Razzaq
    Ahmad, Sajjad
    Hashmi, Hashim Nisar
    ENVIRONMENTAL MONITORING AND ASSESSMENT, 2018, 190 (12)
  • [44] Optimisation of turning parameters by integrating genetic algorithm with support vector regression and artificial neural networks
    Amit Kumar Gupta
    Sharath Chandra Guntuku
    Raghuram Karthik Desu
    Aditya Balu
    The International Journal of Advanced Manufacturing Technology, 2015, 77 : 331 - 339
  • [45] Forecasting of Wind Power Generation with the Use of Artificial Neural Networks and Support Vector Regression Models
    Zafirakis, Dimitris
    Tzanes, Georgios
    Kaldellis, John K.
    RENEWABLE ENERGY INTEGRATION WITH MINI/MICROGRID, 2019, 159 : 509 - 514
  • [46] Optimisation of turning parameters by integrating genetic algorithm with support vector regression and artificial neural networks
    Gupta, Amit Kumar (akgupta@hyderabad.bits-pilani.ac.in), 2015, Springer London (77): : 1 - 4
  • [47] Least-squares support vector machines for simultaneous voltammetric determination of lead and tin: A comparison between LS-SVM and PLS in voltammetric data
    Niazi, Ali
    Sharifi, Sasan
    Amjadi, Effat
    JOURNAL OF ELECTROANALYTICAL CHEMISTRY, 2008, 623 (01) : 86 - 92
  • [48] Comparison of artificial neural networks (ANN), support vector machine (SVM) and gene expression programming (GEP) approaches for predicting TBM penetration rate
    Alireza Afradi
    Arash Ebrahimabadi
    SN Applied Sciences, 2020, 2
  • [49] Comparison of artificial neural networks (ANN), support vector machine (SVM) and gene expression programming (GEP) approaches for predicting TBM penetration rate
    Afradi, Alireza
    Ebrahimabadi, Arash
    SN APPLIED SCIENCES, 2020, 2 (12):
  • [50] Application of Least-Squares Support Vector Machine (LS-SVM) to determination of deep level defect centers parameters in semi-insulating GaAs
    Jankowski, Stanislaw
    Kniola, Maciej
    Kozlowski, Roman
    PHOTONICS APPLICATIONS IN ASTRONOMY, COMMUNICATIONS, INDUSTRY, AND HIGH-ENERGY PHYSICS EXPERIMENTS 2006, PTS 1 AND 2, 2006, 6347