Identifying and embedding transferability in data-driven representations of chemical space

被引:0
|
作者
Gould, Tim [1 ]
Chan, Bun [2 ]
Dale, Stephen G. [1 ,3 ]
Vuckovic, Stefan [4 ]
机构
[1] Griffith Univ, Queensland Micro & Nanotechnol Ctr, Nathan, Qld 4111, Australia
[2] Nagasaki Univ, Grad Sch Engn, Bunkyo 1-14, Nagasaki 8528521, Japan
[3] Natl Univ Singapore, Inst Funct Intelligent Mat, 4 Sci Dr 2, Singapore 117544, Singapore
[4] Univ Fribourg, Dept Chem, Fribourg, Switzerland
基金
澳大利亚研究理事会; 瑞士国家科学基金会; 日本学术振兴会;
关键词
DENSITY-FUNCTIONAL THEORY; EXCHANGE; THERMOCHEMISTRY; APPROXIMATIONS; DFT; AI;
D O I
10.1039/d4sc02358g
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Transferability, especially in the context of model generalization, is a paradigm of all scientific disciplines. However, the rapid advancement of machine learned model development threatens this paradigm, as it can be difficult to understand how transferability is embedded (or missed) in complex models developed using large training data sets. Two related open problems are how to identify, without relying on human intuition, what makes training data transferable; and how to embed transferability into training data. To solve both problems for ab initio chemical modelling, an indispensable tool in everyday chemistry research, we introduce a transferability assessment tool (TAT) and demonstrate it on a controllable data-driven model for developing density functional approximations (DFAs). We reveal that human intuition in the curation of training data introduces chemical biases that can hamper the transferability of data-driven DFAs. We use our TAT to motivate three transferability principles; one of which introduces the key concept of transferable diversity. Finally, we propose data curation strategies for general-purpose machine learning models in chemistry that identify and embed the transferability principles. We show that human intuition in the curation of training data introduces biases that hamper model transferability. We introduce a transferability assessment tool which rigorously measures and subsequently improves transferability.
引用
收藏
页码:11122 / 11133
页数:12
相关论文
共 50 条
  • [41] Improved Transferability of Data-Driven Damage Models Through Sample Selection Bias Correction
    Wagenaar, Dennis
    Hermawan, Tiaravanni
    van den Homberg, Marc
    Aerts, Jeroen C. J. H.
    Kreibich, Heidi
    de Moel, Hans
    Bouwer, Laurens M.
    [J]. RISK ANALYSIS, 2021, 41 (01) : 37 - 55
  • [42] Heterogeneous data-driven aerodynamic modeling based on physical feature embedding
    Weiwei ZHANG
    Xuhao PENG
    Jiaqing KOU
    Xu WANG
    [J]. Chinese Journal of Aeronautics, 2024, 37 (03) : 1 - 6
  • [43] A data-driven framework to manage uncertainty due to limited transferability in urban growth models
    Yu, Jingyan
    Hagen-Zanker, Alex
    Santitissadeekorn, Naratip
    Hughes, Susan
    [J]. COMPUTERS ENVIRONMENT AND URBAN SYSTEMS, 2022, 98
  • [44] Data-driven Modeling of Nonlinear Joints in Space Structures
    Zhang, Yonglei
    Wang, Xiaoyu
    Li, Xinyuan
    Wen, Hao
    Xu, Shidong
    [J]. 2022 41ST CHINESE CONTROL CONFERENCE (CCC), 2022, : 5549 - 5553
  • [45] Data-driven method for identifying the expression of the Lyapunov exponent from random data
    Chen, Xi
    Jin, Xiaoling
    Huang, Zhilong
    [J]. INTERNATIONAL JOURNAL OF NON-LINEAR MECHANICS, 2023, 148
  • [46] DATA-DRIVEN RADIAL COMPRESSOR DESIGN SPACE MAPPING
    Brind, James
    [J]. PROCEEDINGS OF ASME TURBO EXPO 2024: TURBOMACHINERY TECHNICAL CONFERENCE AND EXPOSITION, GT2024, VOL 12D, 2024,
  • [47] Data-driven many-body representations with chemical accuracy for molecular simulations from the gas to the condensed phase
    Thuong Nguyen
    Szekely, Eszter
    Imbalzano, Giulio
    Behler, Joerg
    Csanyi, Gabor
    Ceriotti, Michele
    Goetz, Andreas
    Paesani, Francesco
    [J]. ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2018, 256
  • [48] Data-driven strategies for optimization of integrated chemical plants
    Ma, Kaiwen
    V. Sahinidis, Nikolaos
    Amaran, Satyajith
    Bindlish, Rahul
    Bury, Scott J.
    Griffith, Devin
    Rajagopalan, Sreekanth
    [J]. COMPUTERS & CHEMICAL ENGINEERING, 2022, 166
  • [49] Objective, Quantitative, Data-Driven Assessment of Chemical Probes
    Antolin, Albert A.
    Tym, Joseph E.
    Komianou, Angeliki
    Collins, Ian
    Workman, Paul
    Al-Lazikani, Bissan
    [J]. CELL CHEMICAL BIOLOGY, 2018, 25 (02): : 194 - +
  • [50] A Comparison of Two Tree Representations for Data-Driven Volumetric Image Filtering
    Jalba, Andrei C.
    Westenberg, Michel A.
    [J]. MATHEMATICAL MORPHOLOGY AND ITS APPLICATIONS TO IMAGE AND SIGNAL PROCESSING, (ISMM 2011), 2011, 6671 : 405 - 416