Support vector machine regression (LS-SVM)-an alternative to artificial neural networks (ANNs) for the analysis of quantum chemistry data?

被引:169
|
作者
Balabin, Roman M. [1 ]
Lomakina, Ekaterina I. [2 ]
机构
[1] ETH, Dept Chem & Appl Biosci, CH-8093 Zurich, Switzerland
[2] ETH, Dept Comp Sci, CH-8093 Zurich, Switzerland
关键词
NEAR-INFRARED SPECTROSCOPY; COMBINED 1ST-PRINCIPLES CALCULATION; ALKANES RAMAN-SPECTROSCOPY; POTENTIAL-ENERGY SURFACES; DENSITY-FUNCTIONAL THEORY; NIR SPECTROSCOPY; N-PENTANE; GASOLINE CLASSIFICATION; ENTHALPY DIFFERENCE; BASE STOCK;
D O I
10.1039/c1cp00051a
中图分类号
O64 [物理化学(理论化学)、化学物理学];
学科分类号
070304 ; 081704 ;
摘要
A multilayer feed-forward artificial neural network (MLP-ANN) with a single, hidden layer that contains a finite number of neurons can be regarded as a universal non-linear approximator. Today, the ANN method and linear regression (MLR) model are widely used for quantum chemistry (QC) data analysis (e. g., thermochemistry) to improve their accuracy (e. g., Gaussian G2-G4, B3LYP/B3-LYP, X1, or W1 theoretical methods). In this study, an alternative approach based on support vector machines (SVMs) is used, the least squares support vector machine (LS-SVM) regression. It has been applied to ab initio (first principle) and density functional theory (DFT) quantum chemistry data. So, QC + SVM methodology is an alternative to QC + ANN one. The task of the study was to estimate the Moller-Plesset (MPn) or DFT (B3LYP, BLYP, BMK) energies calculated with large basis sets (e. g., 6-311G(3df, 3pd)) using smaller ones (6-311G, 6-311G*, 6-311G**) plus molecular descriptors. A molecular set (BRM-208) containing a total of 208 organic molecules was constructed and used for the LS-SVM training, cross-validation, and testing. MP2, MP3, MP4(DQ), MP4(SDQ), and MP4/MP4(SDTQ) ab initio methods were tested. Hartree-Fock (HF/SCF) results were also reported for comparison. Furthermore, constitutional (CD: total number of atoms and mole fractions of different atoms) and quantum-chemical (QD: HOMO-LUMO gap, dipole moment, average polarizability, and quadrupole moment) molecular descriptors were used for the building of the LS-SVM calibration model. Prediction accuracies (MADs) of 1.62 perpendicular to 0.51 and 0.85 +/- 0.24 kcal mol(-1) (1 kcal mol(-1) = 4.184 kJ mol(-1)) were reached for SVM-based approximations of ab initio and DFT energies, respectively. The LS-SVM model was more accurate than the MLR model. A comparison with the artificial neural network approach shows that the accuracy of the LS-SVM method is similar to the accuracy of ANN. The extrapolation and interpolation results show that LS-SVM is superior by almost an order of magnitude over the ANN method in terms of the stability, generality, and robustness of the final model. The LS-SVM model needs a much smaller numbers of samples (a much smaller sample set) to make accurate prediction results. Potential energy surface (PES) approximations for molecular dynamics (MD) studies are discussed as a promising application for the LS-SVM calibration approach.
引用
收藏
页码:11710 / 11718
页数:9
相关论文
共 50 条
  • [31] Forecasting Chaotic Series in Manufacturing Systems by Vector Support Machine Regression and Neural Networks
    Alfaro, M. D.
    Sepulveda, J. M.
    Ulloa, J. A.
    INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL, 2013, 8 (01) : 8 - 17
  • [32] Estimation of River Bedform Dimension Using Artificial Neural Network (ANN) and Support Vector Machine (SVM)
    Javadi, F.
    Ahmadi, M. M.
    Qaderi, K.
    JOURNAL OF AGRICULTURAL SCIENCE AND TECHNOLOGY, 2015, 17 (04): : 859 - 868
  • [33] Optimization of Acid Gas Sweetening Plant Based on Least Squares - Support Vector Machine (LS-SVM) Model and Grey Wolf Optimizer (GWO)
    Biyanto, Totok Ruki
    Afdanny, Naindar
    Alfarisi, Muhammad Salman
    Haksoro, Toto
    Kusumaningtyas, Shita Agustin
    2016 INTERNATIONAL SEMINAR ON SENSORS, INSTRUMENTATION, MEASUREMENT AND METROLOGY (ISSIMM), 2016, : 1 - 7
  • [34] Hybrid Machine Learning Model for Body Fat Percentage Prediction Based on Support Vector Regression and Emotional Artificial Neural Networks
    Hussain, Solaf A.
    Cavus, Nadire
    Sekeroglu, Boran
    APPLIED SCIENCES-BASEL, 2021, 11 (21):
  • [35] Failure Detection using Support Vector Machine and Artificial Neural Networks: A Comparative Study
    Yuan Fuqing
    Kumar, Uday
    Galar, Diego
    8TH INTERNATIONAL CONFERENCE ON CONDITION MONITORING AND MACHINERY FAILURE PREVENTION TECHNOLOGIES 2011, VOLS 1 AND 2, 2011, : 189 - 201
  • [36] A least-squares support vector machine (LS-SVM) based on fractal analysis and CIELab parameters for the detection of browning degree on mango (Mangifera indica L.)
    Zheng, Hong
    Lu, Hongfei
    COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2012, 83 : 47 - 51
  • [37] Prediction of hotel bankruptcy using support vector machine, artificial neural network, logistic regression, and multivariate discriminant analysis
    Kim, Soo Y.
    SERVICE INDUSTRIES JOURNAL, 2011, 31 (03): : 441 - 468
  • [38] Estimating photovoltaic power generation: Performance analysis of artificial neural networks, Support Vector Machine and Kalman filter
    Monteiro, Raul V. A.
    Guimaraes, Geraldo C.
    Moura, Fabricio A. M.
    Albertini, Madeleine R. M. C.
    Albertini, Marcelo K.
    ELECTRIC POWER SYSTEMS RESEARCH, 2017, 143 : 643 - 656
  • [39] Machine-learning methods for integrated renewable power generation: A comparative study of artificial neural networks, support vector regression, and Gaussian Process Regression
    Sharifzadeh, Mandi
    Sikinioti-Lock, Alexandra
    Shah, Nilay
    RENEWABLE & SUSTAINABLE ENERGY REVIEWS, 2019, 108 : 513 - 538
  • [40] Performance assessment of artificial neural networks and support vector regression models for stream flow predictions
    Abdul Razzaq Ateeq-ur-Rauf
    Sajjad Ghumman
    Hashim Nisar Ahmad
    Environmental Monitoring and Assessment, 2018, 190