Identifying and embedding transferability in data-driven representations of chemical space

被引:0
|
作者
Gould, Tim [1 ]
Chan, Bun [2 ]
Dale, Stephen G. [1 ,3 ]
Vuckovic, Stefan [4 ]
机构
[1] Griffith Univ, Queensland Micro & Nanotechnol Ctr, Nathan, Qld 4111, Australia
[2] Nagasaki Univ, Grad Sch Engn, Bunkyo 1-14, Nagasaki 8528521, Japan
[3] Natl Univ Singapore, Inst Funct Intelligent Mat, 4 Sci Dr 2, Singapore 117544, Singapore
[4] Univ Fribourg, Dept Chem, Fribourg, Switzerland
基金
澳大利亚研究理事会; 瑞士国家科学基金会; 日本学术振兴会;
关键词
DENSITY-FUNCTIONAL THEORY; EXCHANGE; THERMOCHEMISTRY; APPROXIMATIONS; DFT; AI;
D O I
10.1039/d4sc02358g
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Transferability, especially in the context of model generalization, is a paradigm of all scientific disciplines. However, the rapid advancement of machine learned model development threatens this paradigm, as it can be difficult to understand how transferability is embedded (or missed) in complex models developed using large training data sets. Two related open problems are how to identify, without relying on human intuition, what makes training data transferable; and how to embed transferability into training data. To solve both problems for ab initio chemical modelling, an indispensable tool in everyday chemistry research, we introduce a transferability assessment tool (TAT) and demonstrate it on a controllable data-driven model for developing density functional approximations (DFAs). We reveal that human intuition in the curation of training data introduces chemical biases that can hamper the transferability of data-driven DFAs. We use our TAT to motivate three transferability principles; one of which introduces the key concept of transferable diversity. Finally, we propose data curation strategies for general-purpose machine learning models in chemistry that identify and embed the transferability principles. We show that human intuition in the curation of training data introduces biases that hamper model transferability. We introduce a transferability assessment tool which rigorously measures and subsequently improves transferability.
引用
收藏
页码:11122 / 11133
页数:12
相关论文
共 50 条
  • [1] Model and data-driven representations of the sleep cycle using locally linear embedding
    Beth A Lopour
    Heidi E Kirsch
    James W Sleigh
    Andrew J Szeri
    [J]. BMC Neuroscience, 10 (Suppl 1)
  • [2] Manifold embedding data-driven mechanics
    Bahmani, Bahador
    Sun, WaiChing
    [J]. JOURNAL OF THE MECHANICS AND PHYSICS OF SOLIDS, 2022, 166
  • [3] Software ecosystem for the data-driven design of chemical systems and the exploration of chemical space
    Hachmann, Johannes
    Haghighatlari, Mojtaba
    Evangelista, William
    Afzal, Mohammad Atif Faiz
    Shih, Ching-Yen
    Moore, Bryan
    Pechagin, Mikhail
    Tian, Yujie
    [J]. ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2016, 252
  • [4] Learning continuous and data-driven molecular descriptors by translating equivalent chemical representations
    Winter, Robin
    Montanari, Floriane
    Noe, Frank
    Clevert, Djork-Arne
    [J]. CHEMICAL SCIENCE, 2019, 10 (06) : 1692 - 1701
  • [5] Identifying Arguments of Space-Time Fractional Diffusion: Data-Driven Approach
    Znaidi, Mohamed Ridha
    Gupta, Gaurav
    Asgari, Kamiar
    Bogdan, Paul
    [J]. FRONTIERS IN APPLIED MATHEMATICS AND STATISTICS, 2020, 6
  • [6] Building and deploying a cyberinfrastructure for the data-driven design of chemical systems and the exploration of chemical space
    Hachmann, Johannes
    Afzal, Mohammad Atif Faiz
    Haghighatlari, Mojtaba
    Pal, Yudhajit
    [J]. MOLECULAR SIMULATION, 2018, 44 (11) : 921 - 929
  • [7] A data-driven investigation of human action representations
    Dima, Diana C.
    Hebart, Martin N.
    Isik, Leyla
    [J]. SCIENTIFIC REPORTS, 2023, 13 (01)
  • [8] A data-driven investigation of human action representations
    Diana C. Dima
    Martin N. Hebart
    Leyla Isik
    [J]. Scientific Reports, 13
  • [9] Art driven by visual representations of chemical space
    Gaytán-Hernández D.
    Chávez-Hernández A.L.
    López-López E.
    Miranda-Salas J.
    Saldívar-González F.I.
    Medina-Franco J.L.
    [J]. Journal of Cheminformatics, 15 (1)
  • [10] Transferability and robustness of a data-driven model built on a large number of buildings
    Yan, Ruofei
    Zhao, Tianyi
    Rezgui, Yacine
    Kubicki, Sylvain
    Li, Yu
    [J]. JOURNAL OF BUILDING ENGINEERING, 2023, 80