Transfer learning for molecular property predictions from small datasets

被引:0
|
作者
Kirschbaum, Thorren [1 ]
Bande, Annika [1 ,2 ]
机构
[1] Helmholtz Zentrum Berlin Mat & Energie GmbH, Theory Electron Dynam & Spect, Hahn Meitner Pl 1, D-14109 Berlin, Germany
[2] Leibniz Univ Hannover, Inst Inorgan Chem, Callinstr 9, D-30167 Hannover, Germany
关键词
FREE-ENERGIES; FREESOLV;
D O I
10.1063/5.0214754
中图分类号
TB3 [工程材料学];
学科分类号
0805 ; 080502 ;
摘要
Machine learning has emerged as a new tool in chemistry to bypass expensive experiments or quantum-chemical calculations, for example, in high-throughput screening applications. However, many machine learning studies rely on small datasets, making it difficult to efficiently implement powerful deep learning architectures such as message passing neural networks. In this study, we benchmark common machine learning models for the prediction of molecular properties on two small datasets, for which the best results are obtained with the message passing neural network PaiNN as well as SOAP molecular descriptors concatenated to a set of simple molecular descriptors tailored to gradient boosting with regression trees. To further improve the predictive capabilities of PaiNN, we present a transfer learning strategy that uses large datasets to pre-train the respective models and allows us to obtain more accurate models after fine-tuning on the original datasets. The pre-training labels are obtained from computationally cheap ab initio or semi-empirical models, and both datasets are normalized to mean zero and standard deviation one to align the labels' distributions. This study covers two small chemistry datasets, the Harvard Organic Photovoltaics dataset (HOPV, HOMO-LUMO-gaps), for which excellent results are obtained, and the FreeSolv dataset (solvation energies), where this method is less successful, probably due to a complex underlying learning task and the dissimilar methods used to obtain pre-training and fine-tuning labels. Finally, we find that for the HOPV dataset, the final training results do not improve monotonically with the size of the pre-training dataset, but pre-training with fewer data points can lead to more biased pre-trained models and higher accuracy after fine-tuning.<br /> (c) 2024 Author(s). All article content, except where otherwise noted, is licensed under a Creative Commons Attribution (CC BY) license(https://creativecommons.org/licenses/by/4.0/).
引用
收藏
页数:9
相关论文
共 50 条
  • [1] Hierarchical Machine Learning Model for Mechanical Property Predictions of Polyurethane Elastomers From Small Datasets
    Menon, Aditya
    Thompson-Colon, James A.
    Washburn, Newell R.
    FRONTIERS IN MATERIALS, 2019, 6
  • [2] Building Chemical Property Models for Energetic Materials from Small Datasets Using a Transfer Learning Approach
    Lansford, Joshua L.
    Barnes, Brian C.
    Rice, Betsy M.
    Jensen, Klavs F.
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2022, 62 (22) : 5397 - 5410
  • [3] Transfer learning for small molecule retention predictions
    Osipenko, Sergey
    Botashev, Kazii
    Nikolaev, Eugene
    Kostyukevich, Yury
    JOURNAL OF CHROMATOGRAPHY A, 2021, 1644
  • [4] Transfer Learning on Small Datasets for Improved Fall Detection
    Maray, Nader
    Ngu, Anne Hee
    Ni, Jianyuan
    Debnath, Minakshi
    Wang, Lu
    SENSORS, 2023, 23 (03)
  • [5] Transfer Learning Approach for Indoor Localization with Small Datasets
    Yoon, Jeonghyeon
    Oh, Jisoo
    Kim, Seungku
    REMOTE SENSING, 2023, 15 (08)
  • [6] Overcoming small minirhizotron datasets using transfer learning
    Xu, Weihuang
    Yu, Guohao
    Zare, Alina
    Zurweller, Brendan
    Rowland, Diane
    Reyes-Cabrera, Joel
    Fritschi, Felix B.
    Matamala, Roser
    Juenger, Thomas E.
    COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2020, 175
  • [7] Facial Expression Recognition by Transfer Learning for Small Datasets
    Li, Jianjun
    Huang, Siming
    Zhang, Xin
    Fu, Xiaofeng
    Chang, Ching-Chun
    Tang, Zhuo
    Luo, Zhenxing
    SECURITY WITH INTELLIGENT COMPUTING AND BIG-DATA SERVICES, 2020, 895 : 756 - 770
  • [8] From Macro to Micro Expression Recognition: Deep Learning on Small Datasets Using Transfer Learning
    Peng, Min
    Wu, Zhan
    Zhang, Zhihao
    Chen, Tong
    PROCEEDINGS 2018 13TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE & GESTURE RECOGNITION (FG 2018), 2018, : 657 - 661
  • [9] Deep Learning for Emotion Recognition on Small Datasets Using Transfer Learning
    Hong-Wei Ng
    Viet Dung Nguyen
    Vonikakis, Vassilios
    Winkler, Stefan
    ICMI'15: PROCEEDINGS OF THE 2015 ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2015, : 443 - 449
  • [10] Machine learning for molecular property predictions, and the software ecosystem that enables it
    Hachmann, Johannes
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2019, 257