Nanomaterial Synthesis Insights from Machine Learning of Scientific Articles by Extracting, Structuring, and Visualizing Knowledge

被引:45
|
作者
Hiszpanski, Anna M. [1 ]
Gallagher, Brian [2 ]
Chellappan, Karthik [3 ]
Li, Peggy [3 ]
Liu, Shusen [2 ]
Kim, Hyojin [2 ]
Han, Jinkyu [1 ]
Kailkhura, Bhavya [2 ]
Buttler, David J. [2 ]
Han, Thomas Yong-Jin [1 ]
机构
[1] Lawrence Livermore Natl Lab, Mat Sci Div, Livermore, CA 94550 USA
[2] Lawrence Livermore Natl Lab, Ctr Appl Sci Comp, Livermore, CA 94550 USA
[3] Lawrence Livermore Natl Lab, Global Secur Comp Applicat Div, Livermore, CA 94550 USA
关键词
SEED-MEDIATED GROWTH; AG NANOCUBES; EDGE LENGTH; GOLD; SILVER; NANOWIRES; PERFORMANCE; INFRASTRUCTURE; REDUCTION; PLATFORM;
D O I
10.1021/acs.jcim.0c00199
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
Nanomaterials of varying compositions and morphologies are of interest for many applications from catalysis to optics, but the synthesis of nanomaterials and their scale-up are most often time-consuming and Edisonian processes. Information gleaned from the scientific literature can help inform and accelerate nanomaterials development, but again, searching the literature and digesting the information are time-consuming manual processes for researchers. To help address these challenges, we developed scientific article-processing tools that extract and structure information from the text and figures of nanomaterials articles, thereby enabling the creation of a personalized knowledgebase for nanomaterials synthesis that can be mined to help inform further nanomaterials development. Starting with a corpus of similar to 35k nanomaterials-related articles, we developed models to classify articles according to the nanomaterial composition and morphology, extract synthesis protocols from within the articles' text, and extract, normalize, and categorize chemical terms within synthesis protocols. We demonstrate the efficiency of the proposed pipeline on an expert-labeled set of nanomaterials synthesis articles, achieving 100% accuracy on composition prediction, 95% accuracy on morphology prediction, 0.99 AUC on protocol identification, and up to a 0.87 F1-score on chemical entity recognition. In addition to processing articles' text, microscopy images of nanomaterials within the articles are also automatically identified and analyzed to determine the nanomaterials' morphologies and size distributions. To enable users to easily explore the database, we developed a complementary browser-based visualization tool that provides flexibility in comparing across subsets of articles of interest. We use these tools and information to identify trends in nanomaterials synthesis, such as the correlation of certain reagents with various nanomaterial morphologies, which is useful in guiding hypotheses and reducing the potential parameter space during experimental design.
引用
收藏
页码:2876 / 2887
页数:12
相关论文
共 50 条
  • [1] Machine Learning Methods for Extracting Newspaper Articles from PDF Files
    Fatima, Peer
    Fathima, S. K.
    Al Khatatneh, Arwa Mahmoud
    Al Qudah, Mosab Kasim
    [J]. 2024 5TH INTERNATIONAL CONFERENCE ON INNOVATIVE TRENDS IN INFORMATION TECHNOLOGY, ICITIIT 2024, 2024,
  • [2] Extracting Core Claims from Scientific Articles
    Jansen, Tom
    Kuhn, Tobias
    [J]. BNAIC 2016: ARTIFICIAL INTELLIGENCE, 2017, 765 : 32 - 46
  • [3] Materials Synthesis Insights from Scientific Literature via Text Extraction and Machine Learning
    Kim, Edward
    Huang, Kevin
    Saunders, Adam
    McCallum, Andrew
    Ceder, Gerbrand
    Olivetti, Elsa
    [J]. CHEMISTRY OF MATERIALS, 2017, 29 (21) : 9436 - 9444
  • [4] Extracting and modeling geographic information from scientific articles
    Acheson, Elise
    Purves, Ross S.
    [J]. PLOS ONE, 2021, 16 (01):
  • [5] Extracting experimental parameter entities from scientific articles
    Farnsworth, Steele
    Gurdin, Gabrielle
    Vargas, Jorge
    Mulyar, Andriy
    Lewinski, Nastassja
    McInnes, Bridget T.
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2022, 126
  • [6] Extracting Titles from Scientific References in Patents with Fusion of Representation Learning and Machine Learning
    Zhang, Jinzhu
    Hu, Yiming
    [J]. Data Analysis and Knowledge Discovery, 2019, 3 (05) : 68 - 76
  • [7] Extracting and visualizing knowledge from film and video archives
    Wactlar, HD
    [J]. JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2002, 8 (06) : 602 - 612
  • [8] Machine Learning Techniques for Automatically Extracting Contextual Information from Scientific Publications
    Klampfl, Stefan
    Kern, Roman
    [J]. SEMANTIC WEB EVALUATION CHALLENGES, 2015, 548 : 105 - 116
  • [9] Automated Machine Learning for Information Retrieval in Scientific Articles
    Rakhshani, Hojjat
    Latard, Bastien
    Brevilliers, Mathieu
    Weber, Jonathan
    Lepagnot, Julien
    Forestier, Germain
    Hassenforder, Michel
    Idoumghar, Lhassane
    [J]. 2020 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2020,
  • [10] Extracting Material Property Measurement Data from Scientific Articles
    Panapitiya, Gihan
    Parks, Fred
    Sepulveda, Jonathan
    Saldanha, Emily
    [J]. 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 5393 - 5402