Gaussian Process-Based Refinement of Dispersion Corrections

被引:40
|
作者
Proppe, Jonny [1 ,2 ,3 ]
Gugler, Stefan [3 ]
Reiher, Markus [3 ]
机构
[1] Univ Toronto, Dept Chem, Toronto, ON M5S, Canada
[2] Univ Toronto, Dept Comp Sci, Toronto, ON M5S, Canada
[3] Swiss Fed Inst Technol, Lab Phys Chem, Vladimir Prelog Weg 2, CH-8093 Zurich, Switzerland
基金
瑞士国家科学基金会;
关键词
DENSITY-FUNCTIONAL THEORY; BASIS-SET CONVERGENCE; PREDICTION UNCERTAINTY; RARE-GAS; ENERGIES; ACCURATE; APPROXIMATIONS; POTENTIALS; DATABASE; VALENCE;
D O I
10.1021/acs.jctc.9b00627
中图分类号
O64 [物理化学(理论化学)、化学物理学];
学科分类号
070304 ; 081704 ;
摘要
We employ Gaussian process (GP) regression to adjust for systematic errors in D3-type dispersion corrections. We refer to the associated, statistically improved model as D3-GP. It is trained on differences between interaction energies obtained from PBE-D3(BJ)/ma-def2-QZVPP and DLPNO- CCSD(T)/CBS calculations. We generated a data set containing interaction energies for 1248 molecular dimers, which resemble the dispersion-dominated systems contained in the S66 data set. Our systems represent not only equilibrium structures but also dimers with various relative orientations and conformations at both shorter and longer distances. A reparametrization of the D3(BJ) model based on 66 of these dimers suggests that two of its three empirical parameters, a(1), and s(g), are zero, whereas a(2) = 5.6841 bohr. For the remaining 1182 dimers, we find that this new set of parameters is superior to all previously published D3(BJ) parameter sets. To train our D3-GP model, we engineered two different vectorial representations of (supra-)molecular systems, both derived from the matrix of atom-pairwise D3(BJ) interaction terms: (a) a distance-resolved interaction energy histogram, histD3(BJ), and (b) eigenvalues of the interaction matrix ordered according to their decreasing absolute value, eigD3(BJ). Hence, the GP learns a mapping from D3(BJ) information only, which renders D3-GP-type dispersion corrections comparable to those obtained with the original D3 approach. They improve systematically if the underlying training set is selected carefully. Here, we harness the prediction variance obtained from GP regression to select optimal training sets in an automated fashion. The larger the variance, the more information the corresponding data point may add to the training set. For a given set of molecular systems, variance-based sampling can approximately determine the smallest subset being subjected to reference calculations such that all dispersion corrections for the remaining systems fall below a predefined accuracy threshold. To render the entire D3-GP workflow as efficient as possible, we present an improvement over our variance-based, sequential active-learning scheme [J. Chem. Theory Comput. 2018, 14, 5238]. Our refined learning algorithm selects multiple (instead of single) systems that can be subjected to reference calculations simultaneously. We refer to the underlying selection strategy as batchwise variance-based sampling (BVS). BVS-guided active learning is an essential component of our D3-GP workflow, which is implemented in a black-box fashion. Once provided with reference data for new molecular systems, the underlying GP model automatically learns to adapt to these and similar systems. This approach leads overall to a self-improving model (D3-GP) that predicts system-focused and GP-refined D3-type dispersion corrections for any given system of reference data.
引用
收藏
页码:6046 / 6060
页数:15
相关论文
共 50 条
  • [31] Gaussian Process-Based Learning Model Predictive Control With Application to USV
    Li, Fei
    Li, Huiping
    Wu, Chao
    [J]. IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2024,
  • [32] Efficient implementation of Gaussian process-based predictive control by quadratic programming
    Polcz, Peter
    Peni, Tamas
    Toth, Roland
    [J]. IET CONTROL THEORY AND APPLICATIONS, 2023, 17 (08): : 968 - 984
  • [33] Gaussian Process-based Amortization of Variational Message Passing Update Rules
    Nguyen, Hoang M. H.
    Akbayrak, Semih
    Koudahl, Magnus T.
    de Vries, Bert
    [J]. 2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022), 2022, : 1517 - 1521
  • [34] An Accurate Gaussian Process-Based Early Warning System for Dengue Fever
    Albinati, Julio
    Meira, Wagner, Jr.
    Pappa, Gisele Lobo
    [J]. PROCEEDINGS OF 2016 5TH BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS 2016), 2016, : 43 - 48
  • [35] A Gaussian process-based response surface method for structural reliability analysis
    Su, Guoshao
    Jiang, Jianqing
    Yu, Bo
    Xiao, Yilong
    [J]. STRUCTURAL ENGINEERING AND MECHANICS, 2015, 56 (04) : 549 - 567
  • [36] Deep Gaussian Process-Based Bayesian Inference for Contaminant Source Localization
    Park, Young-Jin
    Tagade, Piyush M.
    Choi, Han-Lim
    [J]. IEEE ACCESS, 2018, 6 : 49432 - 49449
  • [37] Gaussian Process-Based Transfer Kernel Learning for Unsupervised Domain Adaptation
    Ge, Pengfei
    Sun, Yesen
    [J]. MATHEMATICS, 2023, 11 (22)
  • [38] Gaussian Process-based calculation of look-elsewhere trials factor
    Ananiev, V.
    Read, A. L.
    [J]. JOURNAL OF INSTRUMENTATION, 2023, 18 (05)
  • [39] A Gaussian Process-based Self-Organizing Incremental Neural Network
    Wang, Xiaoyu
    Casiraghi, Giona
    Zhang, Yan
    Imura, Jun-ichi
    [J]. 2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
  • [40] Gaussian Process-Based Hybrid Model for Predicting Oxygen Consumption in the Converter Steelmaking Process
    Jiang, Sheng-Long
    Shen, Xinyue
    Zheng, Zhong
    [J]. PROCESSES, 2019, 7 (06)