Accelerating active learning materials discovery with FAIR data and workflows: A case study for alloy melting temperatures

被引:0
|
作者
Harwani, Mohnish [1 ]
Verduzco, Juan C. [2 ,3 ]
Lee, Brian H. [2 ,3 ]
Strachan, Alejandro [2 ,3 ]
机构
[1] Purdue Univ, Dept Comp Sci, W Lafayette, IN 47907 USA
[2] Purdue Univ, Sch Mat Engn, W Lafayette, IN 47907 USA
[3] Purdue Univ, Birck Nanotechnol Ctr, W Lafayette, IN 47907 USA
基金
美国国家科学基金会;
关键词
Active Learning; FAIR data;
D O I
10.1016/j.commatsci.2024.113640
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Active learning (AL) is a powerful sequential optimization approach that has shown great promise in the discovery of new materials. However, a major challenge remains the acquisition of the initial data and the development of workflows to generate new data at each iteration. In this study, we demonstrate a significant speedup in an optimization task by reusing a published simulation workflow available for online simulations and its associated data repository, where the results of each workflow run are automatically stored. Both the workflow and its data follow FAIR (findable, accessible, interoperable, and reusable) principles using nanoHUB's infrastructure. The workflow employs molecular dynamics to calculate the melting temperature of multi-principal component alloys. We leveraged all prior data not only to develop an accurate machine learning model to start the sequential optimization but also to optimize the simulation parameters and accelerate convergence. Prior work showed that finding the alloy composition with the highest melting temperature required testing several alloy compositions, and establishing the melting temperature for each composition took, on average, multiple simulations. By developing a workflow that utilizes the FAIR data in the nanoHUB database, we reduced the number of simulations per composition to one and found the alloy with the lowest melting temperature testing only three compositions. This second optimization, therefore, shows a speedup of 10x as compared to models that do not access the FAIR databases.
引用
收藏
页数:6
相关论文
共 18 条
  • [1] Role of multifidelity data in sequential active learning materials discovery campaigns: case study of electronic bandgap
    Jacobs, Ryan
    Goins, Philip E.
    Morgan, Dane
    MACHINE LEARNING-SCIENCE AND TECHNOLOGY, 2023, 4 (04):
  • [2] Accelerating high-entropy alloy discovery: efficient exploration via active learning
    Sulley, Gloria A.
    Raush, Jonathan
    Montemore, Matthew M.
    Hamm, Jihun
    SCRIPTA MATERIALIA, 2024, 249
  • [3] Accelerating the discovery of battery electrode materials through data mining and deep learning models
    Moses, Isaiah A.
    Barone, Veronica
    Peralta, Juan E.
    JOURNAL OF POWER SOURCES, 2022, 546
  • [4] Active Learning with Realistic Data - A Case Study
    Calma, Adrian
    Stolz, Moritz
    Kottke, Daniel
    Tomforde, Sven
    Sick, Bernhard
    2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
  • [5] Voting Data-Driven Regression Learning for Accelerating Discovery of Advanced Functional Materials and Applications to Two-Dimensional Ferroelectric Materials
    Ma, Xing-Yu
    Lyu, Hou-Yi
    Dong, Xue-Juan
    Zhang, Zhen
    Hao, Kuan-Rong
    Yan, Qing-Bo
    Su, Gang
    JOURNAL OF PHYSICAL CHEMISTRY LETTERS, 2021, 12 (03): : 973 - 981
  • [6] Accelerating materials research with a comprehensive data management tool: a case study on an electrochemical laboratory
    Roettcher, Nico C.
    Akkoc, Gun D.
    Finger, Selina
    Fritsch, Birk
    Moeller, Jonas
    Mayrhofer, Karl J. J.
    Dworschak, Dominik
    JOURNAL OF MATERIALS CHEMISTRY A, 2024, 12 (07) : 3933 - 3942
  • [7] A Case Study of Data Management Challenges Presented in Large-Scale Machine Learning Workflows
    Lee, Claire Songhyun
    Hewes, V.
    Cerati, Giuseppe
    Kowalkowski, Jim
    Aurisano, Adam
    Agrawal, Ankit
    Choudhary, Alok
    Liao, Wei-keng
    2023 IEEE/ACM 23RD INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND INTERNET COMPUTING, CCGRID, 2023, : 71 - 81
  • [8] Data mining and knowledge discovery in materials science and engineering: A polymer nanocomposites case study
    AbuOmar, O.
    Nouranian, S.
    King, R.
    Bouvard, J. L.
    Toghiani, H.
    Lacy, T. E.
    Pittman, C. U., Jr.
    ADVANCED ENGINEERING INFORMATICS, 2013, 27 (04) : 615 - 624
  • [9] Accelerating data acquisition with FPGA-based edge machine learning: a case study with LCLS-II
    Rahimifar, Mohammad Mehdi
    Wingering, Quentin
    Gouin-Ferland, Berthie
    Coffee, Ryan
    Therrien, Audrey C.
    MACHINE LEARNING-SCIENCE AND TECHNOLOGY, 2024, 5 (04):
  • [10] A computational learning paradigm to targeted discovery of biocatalysts from metagenomic data: A case study of lipase identification
    Shahraki, Mehdi F.
    Atanaki, Fereshteh F.
    Ariaeenejad, Shohreh
    Ghaffari, Mohammad R.
    Norouzi-Beirami, Mohammad H.
    Maleki, Morteza
    Salekdeh, Ghasem H.
    Kavousi, Kaveh
    BIOTECHNOLOGY AND BIOENGINEERING, 2022, 119 (04) : 1115 - 1128