Transfer learning for molecular property predictions from small datasets

被引:0
|
作者
Kirschbaum, Thorren [1 ]
Bande, Annika [1 ,2 ]
机构
[1] Helmholtz Zentrum Berlin Mat & Energie GmbH, Theory Electron Dynam & Spect, Hahn Meitner Pl 1, D-14109 Berlin, Germany
[2] Leibniz Univ Hannover, Inst Inorgan Chem, Callinstr 9, D-30167 Hannover, Germany
关键词
FREE-ENERGIES; FREESOLV;
D O I
10.1063/5.0214754
中图分类号
TB3 [工程材料学];
学科分类号
0805 ; 080502 ;
摘要
Machine learning has emerged as a new tool in chemistry to bypass expensive experiments or quantum-chemical calculations, for example, in high-throughput screening applications. However, many machine learning studies rely on small datasets, making it difficult to efficiently implement powerful deep learning architectures such as message passing neural networks. In this study, we benchmark common machine learning models for the prediction of molecular properties on two small datasets, for which the best results are obtained with the message passing neural network PaiNN as well as SOAP molecular descriptors concatenated to a set of simple molecular descriptors tailored to gradient boosting with regression trees. To further improve the predictive capabilities of PaiNN, we present a transfer learning strategy that uses large datasets to pre-train the respective models and allows us to obtain more accurate models after fine-tuning on the original datasets. The pre-training labels are obtained from computationally cheap ab initio or semi-empirical models, and both datasets are normalized to mean zero and standard deviation one to align the labels' distributions. This study covers two small chemistry datasets, the Harvard Organic Photovoltaics dataset (HOPV, HOMO-LUMO-gaps), for which excellent results are obtained, and the FreeSolv dataset (solvation energies), where this method is less successful, probably due to a complex underlying learning task and the dissimilar methods used to obtain pre-training and fine-tuning labels. Finally, we find that for the HOPV dataset, the final training results do not improve monotonically with the size of the pre-training dataset, but pre-training with fewer data points can lead to more biased pre-trained models and higher accuracy after fine-tuning.<br /> (c) 2024 Author(s). All article content, except where otherwise noted, is licensed under a Creative Commons Attribution (CC BY) license(https://creativecommons.org/licenses/by/4.0/).
引用
收藏
页数:9
相关论文
共 50 条
  • [21] Face alignment by learning from small real datasets and large synthetic datasets
    Gao, Haoqi
    Ogawara, Koichi
    2022 ASIA CONFERENCE ON ALGORITHMS, COMPUTING AND MACHINE LEARNING (CACML 2022), 2022, : 397 - 402
  • [22] Transfer learning of deep material network for seamless structure-property predictions
    Liu, Zeliang
    Wu, C. T.
    Koishi, M.
    COMPUTATIONAL MECHANICS, 2019, 64 (02) : 451 - 465
  • [23] Learning from small datasets containing nominal attributes
    Li, Der-Chiang
    Chen, Hung-Yu
    Shi, Qi-Shi
    NEUROCOMPUTING, 2018, 291 : 226 - 236
  • [24] Transfer Learning on fMRI Datasets
    Zhang, Hejia
    Chen, Po-Hsuan
    Ramadge, Peter J.
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 84, 2018, 84
  • [25] Recent Applications of Machine Learning in Molecular Property and Chemical Reaction Outcome Predictions
    Shilpa, Shilpa
    Kashyap, Gargee
    Sunoj, Raghavan B.
    JOURNAL OF PHYSICAL CHEMISTRY A, 2023, 127 (40): : 8253 - 8271
  • [26] Scale–Space Data Augmentation for Deep Transfer Learning of Crack Damage from Small Sized Datasets
    Shimin Tang
    ZhiQiang Chen
    Journal of Nondestructive Evaluation, 2020, 39
  • [27] TransferI2I: Transfer Learning for Image-to-Image Translation from Small Datasets
    Wang, Yaxing
    Laria, Hector
    van de Weijer, Joost
    Lopez-Fuentes, Laura
    Raducanu, Bogdan
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 13990 - 13999
  • [28] Prediction for Lateral Response of Monopiles: Deep Learning Model on Small Datasets Using Transfer Learning
    Alduais, Mohammed
    Taherkhani, Amir Hosein
    Mei, Qipei
    Han, Fei
    GEO-CONGRESS 2024: FOUNDATIONS, RETAINING STRUCTURES, GEOSYNTHETICS, AND UNDERGROUND ENGINEERING, 2024, 350 : 1 - 7
  • [29] Transfer Learning Methods as a New Approach in Computer Vision Tasks with Small Datasets
    Brodzicki, Andrzej
    Piekarski, Michal
    Kucharski, Dariusz
    Jaworek-Korjakowska, Joanna
    Gorgon, Marek
    FOUNDATIONS OF COMPUTING AND DECISION SCIENCES, 2020, 45 (03) : 179 - 193
  • [30] Analysis of different transfer learning approaches when applying AI on small datasets
    Dhont, J.
    Wolfs, C.
    Verhaegen, F.
    RADIOTHERAPY AND ONCOLOGY, 2021, 161 : S1383 - S1385