MaterialBERT for natural language processing of materials science texts

被引:15
|
作者
Yoshitake, Michiko [1 ]
Sato, Fumitaka [1 ,2 ]
Kawano, Hiroyuki [1 ,2 ]
Teraoka, Hiroshi [1 ,2 ]
机构
[1] Natl Inst Mat Sci, MaDIS, 1-1 Namiki, Tsukuba, Ibaraki 3050044, Japan
[2] Ridgelinez, Business Sci Unit, Tokyo, Japan
来源
SCIENCE AND TECHNOLOGY OF ADVANCED MATERIALS-METHODS | 2022年 / 2卷 / 01期
关键词
Word embedding; pre-training; BERT; literal information;
D O I
10.1080/27660400.2022.2124831
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
A BERT (Bidirectional Encoder Representations from Transformers) model, which we named "MaterialBERT", has been generated using scientific papers in wide area of material science as a corpus. A new vocabulary list for tokenizer was generated using material science corpus. Two BERT models with different vocabulary lists for the tokenizer, one with the original one made by Google and the other newly made by the authors, were generated. Word vectors embedded during the pre-training with the two MaterialBERT models reasonably reflect the meanings of materials names in material-class clustering and in the relationship between base materials and their compounds or derivatives for not only inorganic materials but also organic materials and organometallic compounds. Fine-tuning with CoLA (The Corpus of Linguistic Acceptability) using the pre-trained MaterialBERT showed a higher score than the original BERT. The two MaterialBERTs could be also utilized as a starting point for transfer learning of a narrower domain-specific BERT. [GRAPHICS]
引用
收藏
页码:372 / 380
页数:9
相关论文
共 50 条
  • [41] The rhetorical parsing of natural language texts
    Marcu, D
    35TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 8TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, 1997, : 96 - 103
  • [42] Using natural language processing to analyse text data in behavioural science
    Feuerriegel, Stefan
    Maarouf, Abdurahman
    Baer, Dominik
    Geissler, Dominique
    Schweisthal, Jonas
    Proellochs, Nicolas
    Robertson, Claire E.
    Rathje, Steve
    Hartmann, Jochen
    Mohammad, Saif M.
    Netzer, Oded
    Siegel, Alexandra A.
    Plank, Barbara
    Van Bavel, Jay J.
    NATURE REVIEWS PSYCHOLOGY, 2025, 4 (02): : 96 - 111
  • [43] Natural Language Query Processing for Life Science Knowledge Position Paper
    Kim, Jin-Dong
    Yamamoto, Yasunori
    Yamaguchi, Atsuko
    Nakao, Mitsuteru
    Oouchida, Kenta
    Chun, Hong-Woo
    Takagi, Toshihisa
    ACTIVE MEDIA TECHNOLOGY, 2010, 6335 : 158 - +
  • [44] Social Science for Natural Language Processing: A Hostile Narrative Analysis Prototype
    Anning, Stephen
    Konstantinidis, George
    Webber, Craig
    PROCEEDINGS OF THE 13TH ACM WEB SCIENCE CONFERENCE, WEBSCI 2021, 2020, : 102 - 111
  • [45] Designing a Natural Language Processing System to Support Social Science Research
    Gone, Keshava Pallavi
    Smit, Michael
    PROCEEDINGS OF THE 2023 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING, ASONAM 2023, 2023, : 345 - 347
  • [46] Lessons Learned from a Citizen Science Project for Natural Language Processing
    Klie, Jan-Christoph
    Lee, Ji-Ung
    Stowe, Kevin
    Sahin, Gozde Gul
    Moosavi, Nafise Sadat
    Bates, Luke
    Petrak, Dominic
    de Castilho, Richard Eckart
    Gurevych, Iryna
    17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 3594 - 3608
  • [47] Data Science and Natural Language Processing to Extract Information in Clinical Domain
    Vydiswaran, V. G. Vinod
    Zhao, Xinyan
    Yu, Deahan
    PROCEEDINGS OF THE 5TH JOINT INTERNATIONAL CONFERENCE ON DATA SCIENCE & MANAGEMENT OF DATA, CODS COMAD 2022, 2022, : 352 - 353
  • [48] Natural Language Processing and Cognitive Science - Proceedings of the 5th International Workshop on Natural Language Processing and Cognitive Science, NLPCS 2008; In Conjunction with ICEIS 2008: Foreword
    Sharp, Bernadette
    Zock, Michael
    2008, Inst. for Syst. and Technol. of Inf. Control and Commun., Av. D. Manuel I, 27 r/c esq, Setubal, 2910-595, Portugal
  • [49] Natural language processing
    Chowdhury, GG
    ANNUAL REVIEW OF INFORMATION SCIENCE AND TECHNOLOGY, 2003, 37 : 51 - 89
  • [50] Natural language processing
    Martinez, Angel R.
    WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2010, 2 (03) : 352 - 357