Machine Learning with Enormous "Synthetic" Data Sets: Predicting Glass Transition Temperature of Polyimides Using Graph Convolutional Neural Networks

被引:24
|
作者
Volgin, Igor V. [1 ]
Batyr, Pavel A. [2 ,3 ]
Matseevich, Andrey V.
Dobrovskiy, Alexey Yu. [1 ]
Andreeva, Maria V. [1 ]
Nazarychev, Victor M. [1 ]
Larin, Sergey V. [1 ]
Goikhman, Mikhail Ya. [1 ]
Vizilter, Yury V. [2 ]
Askadskii, Andrey A. [3 ,4 ]
Lyulin, Sergey V. [1 ]
机构
[1] Russian Acad Sci IMC RAS, Inst Macromol Cpds, St Petersburg 199004, Russia
[2] State Res Inst Aviat Syst GosNIIAS, Fed State Unitary Enterprise, Moscow 125167, Russia
[3] Russian Acad Sci INEOS RAS, AN Nesmeyanov Inst Organoelement Cpds, Moscow 119991, Russia
[4] Moscow State Univ Civil Engn MGSU, Moscow 129337, Russia
来源
ACS OMEGA | 2022年 / 7卷 / 48期
基金
俄罗斯科学基金会;
关键词
MOLECULAR-DYNAMICS SIMULATIONS; UNIT STRUCTURE; POLYMERS; WEIGHT; OPPORTUNITIES; POLYSTYRENES; DEPENDENCE; DISCOVERY; ACRYLATE; DESIGN;
D O I
10.1021/acsomega.2c04649
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
In the present work, we address the problem of utilizing machine learning (ML) methods to predict the thermal properties of polymers by establishing "structure-property" relationships. Having focused on a particular class of heterocyclic polymers, namely polyimides (PIs), we developed a graph convolutional neural network (GCNN), being one of the most promising tools for working with big data, to predict the PI glass transition temperature Tg as an example of the fundamental property of polymers. To train the GCNN, we propose an original methodology based on using a "transfer learning" approach with an enormous "synthetic" data set for pretraining and a small experimental data set for its fine-tuning. The "synthetic" data set contains more than 6 million combinatorically generated repeating units of PIs and theoretical values of their Tg values calculated using the well-established Askadskii's quantit ative structure-property relationship (QSPR) computational scheme. Additionally, an experimental data set for 214 PIs was also collected from the literature for training, fine-tuning, and validation of the GCNN. Both "synthetic" and experimental data sets are included into a PolyAskInG database (Polymer Askadskii's Intelligent Gateway). By using the PolyAskInG database, we developed GCNN which allows estimation of Tg of PI with a mean absolute error (MAE) of about 20 K, which is 1.5 times lower than in the case of Askadskii QSPR analysis (33 K). To prove the efficiency and usability of the proposed GCNN architecture and training methodology for predicting polymer properties, we also employed "transfer learning" to develop alternative GCNN pretrained on proxy-characteristics taken from the popular quantum chemical QM9 database for small compounds and fine-tuned on an experimental Tg values data set from PolyAskInG database. The obtained results indicate that pretraining of GCNN on the "synthetic" polymer data set provides MAE which is almost twice as low as that in the case of using the QM9 data set in the pretraining stage (similar to 41 K). Furthermore, we address the questions associated with the influence of the differences in the size of the experimental and "synthetic" data sets (so-called "reality gap" problem), as well as their chemical composition on the training quality. Our results state the overall priority of using polymer data sets for developing deep neural networks, and GCNN in particular, for efficient prediction of polymer properties. Moreover, our work opens up a challenge for the theoretically supported generation of large "synthetic" data sets of polymer properties for the training of the complex ML models. The proposed methodology is rather versatile and may be generalized for predicting other properties of different polymers and copolymers synthesized through the polycondensation reaction.
引用
收藏
页码:43678 / 43691
页数:14
相关论文
共 50 条
  • [31] Machine-Learning-Based Prediction of the Glass Transition Temperature of Organic Compounds Using Experimental Data
    Armeli, Gianluca
    Peters, Jan-Hendrik
    Koop, Thomas
    ACS OMEGA, 2023, 8 (13): : 12298 - 12309
  • [32] A Graph Construction Method for Anomalous Traffic Detection with Graph Neural Networks Using Sets of Flow Data
    Okui, Norihiro
    Akimoto, Yusuke
    Kubota, Ayumu
    Yoshida, Takuya
    2023 IEEE 47TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE, COMPSAC, 2023, : 1017 - 1018
  • [33] Machine learning prediction of structural dynamic responses using graph neural networks
    Li, Qilin
    Wang, Zitong
    Li, Ling
    Hao, Hong
    Chen, Wensu
    Shao, Yanda
    COMPUTERS & STRUCTURES, 2023, 289
  • [34] Learning Convolutional Neural Networks for Data-Flow Graph Mapping on Spatial Programmable Architectures
    Yin, Shouyi
    Liu, Dajiang
    Sun, Lifeng
    Lin, Xinhan
    Liu, Leibo
    Wei, Shaojun
    FPGA'17: PROCEEDINGS OF THE 2017 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS, 2017, : 295 - 295
  • [35] Disease Prediction using Synthetic Image Representations of Metagenomic data and Convolutional Neural Networks
    Thanh Hai Nguyen
    Prifti, Edi
    Sokolovska, Nataliya
    Zucker, Jean-Daniel
    2019 IEEE - RIVF INTERNATIONAL CONFERENCE ON COMPUTING AND COMMUNICATION TECHNOLOGIES (RIVF), 2019, : 231 - 236
  • [36] Predicting Hard Disk Failures in Data Centers Using Temporal Convolutional Neural Networks
    Burrello, Alessio
    Pagliari, Daniele Jahier
    Bartolini, Andrea
    Benini, Luca
    Macii, Enrico
    Poncino, Massimo
    EURO-PAR 2020: PARALLEL PROCESSING WORKSHOPS, 2021, 12480 : 277 - 289
  • [37] Predicting glass transition temperature of polymers by combining molecular dynamics simulations and machine learning techniques
    Zhan, Siqi
    Huang, Wanhui
    Dong, Caibo
    Chen, Qionghai
    Zhao, Hengheng
    Duan, Pengwei
    Hu, Anwen
    Li, Qian
    Li, Ying
    Liu, Jun
    Zhang, Liqun
    MATERIALS TODAY COMMUNICATIONS, 2024, 40
  • [38] Predicting the Demand in Bitcoin Using Data Charts: A Convolutional Neural Networks Prediction Model
    Ibrahim, Ahmed F.
    Corrigan, Liam
    Kashef, Rasha
    2020 IEEE CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (CCECE), 2020,
  • [39] Predicting Stroke Risk Based on ICD Codes Using Graph-Based Convolutional Neural Networks
    Tiba, Attila
    Berczes, Tamas
    Berczes, Attila
    Zsuga, Judit
    MATHEMATICS, 2024, 12 (12)
  • [40] On the Analyses of Medical Images Using Traditional Machine Learning Techniques and Convolutional Neural Networks
    Saeed Iqbal
    Adnan N. Qureshi
    Jianqiang Li
    Tariq Mahmood
    Archives of Computational Methods in Engineering, 2023, 30 (5) : 3173 - 3233