Challenges and Advances in Information Extraction from Scientific Literature: a Review

被引:21
|
作者
Hong, Zhi [1 ]
Ward, Logan [2 ]
Chard, Kyle [1 ,2 ]
Blaiszik, Ben [1 ,2 ]
Foster, Ian [1 ,2 ]
机构
[1] Univ Chicago, Chicago, IL 60637 USA
[2] Argonne Natl Lab, Lemont, IL USA
关键词
Information extraction; Text mining; Scientific data; PROPERTY DATA; RECOGNITION; GENERATION; RECAPTCHA; STANDARD; SYSTEM; WEB;
D O I
10.1007/s11837-021-04902-9
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Scientific articles have long been the primary means of disseminating scientific discoveries. Over the centuries, valuable data and potentially groundbreaking insights have been collected and buried deep in the mountain of publications. In materials engineering, such data are spread across technical handbooks specification sheets, journal articles, and laboratory notebooks in myriad formats. Extracting information from papers on a large scale has been a tedious and time-consuming job to which few researchers have wanted to devote their limited time and effort, yet is an activity that is essential for modern data-driven design practices. However, in recent years, significant progress has been made by the computer science community on techniques for automated information extraction from free text. Yet, transformative application of these techniques to scientific literature remains elusive-due not to a lack of interest or effort but to technical and logistical challenges. Using the challenges in the materials science literature as a driving motivation, we review the gaps between state-of-the-art information extraction methods and the practical application of such methods to scientific texts, and offer a comprehensive overview of work that can be undertaken to close these gaps.
引用
收藏
页码:3383 / 3400
页数:18
相关论文
共 50 条
  • [31] ADVANCES IN INFORMATION EXTRACTION TECHNIQUES
    NAGY, G
    [J]. REMOTE SENSING OF ENVIRONMENT, 1984, 15 (02) : 167 - 175
  • [32] Benefits and Challenges in Information Security Certification - A Systematic Literature Review
    Hulshof, Mike
    Daneva, Maya
    [J]. BUSINESS MODELING AND SOFTWARE DESIGN (BMSD 2021), 2021, 422 : 154 - 169
  • [33] An information extraction and representation system for rapid review of the biomedical literature
    Revere, D
    Fuller, S
    Bugni, PF
    Martin, GM
    [J]. MEDINFO 2004: PROCEEDINGS OF THE 11TH WORLD CONGRESS ON MEDICAL INFORMATICS, PT 1 AND 2, 2004, 107 : 788 - 792
  • [34] Challenges in information extraction from text for knowledge management
    Ciravegna, F
    [J]. IEEE INTELLIGENT SYSTEMS, 2001, 16 (06) : 88 - 90
  • [35] Challenges for Information Extraction from Dialogue in Criminal Law
    Hong, Jenny
    Voss, Catalin
    Manning, Christopher D.
    [J]. NLP4POSIMPACT 2021: THE 1ST WORKSHOP ON NLP FOR POSITIVE IMPACT, 2021, : 71 - 81
  • [36] Automatic extraction of materials and properties from superconductors scientific literature
    Foppiano, Luca
    Castro, Pedro Baptista
    Suarez, Pedro Ortiz
    Terashima, Kensei
    Takano, Yoshihiko
    Ishii, Masashi
    [J]. SCIENCE AND TECHNOLOGY OF ADVANCED MATERIALS-METHODS, 2023, 3 (01):
  • [37] CERMINE - automatic extraction of metadata and references from scientific literature
    Tkaczyk, Dominika
    Szostek, Pawel
    Dendek, Piotr Jan
    Fedoryszak, Mateusz
    Bolikowski, Lukasz
    [J]. 2014 11TH IAPR INTERNATIONAL WORKSHOP ON DOCUMENT ANALYSIS SYSTEMS (DAS 2014), 2014, : 217 - 221
  • [38] CERMINE: automatic extraction of structured metadata from scientific literature
    Dominika Tkaczyk
    Paweł Szostek
    Mateusz Fedoryszak
    Piotr Jan Dendek
    Łukasz Bolikowski
    [J]. International Journal on Document Analysis and Recognition (IJDAR), 2015, 18 : 317 - 335
  • [39] Rapid Extraction of Research Areas from Scientific and Technological Literature
    Yin, Chuan
    Liu, Wanzeng
    Yin, Duoduo
    Zhai, Xi
    Liu, Kexin
    Jing, Changfeng
    Huang, He
    [J]. SENSORS AND MATERIALS, 2020, 32 (12) : 4489 - 4504
  • [40] Unleashing the Power of Knowledge Extraction from Scientific Literature in Catalysis
    Zhang, Yue
    Wang, Cong
    Soukaseum, Mya
    Vlachos, Dionisios G.
    Fang, Hui
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2022, 62 (14) : 3316 - 3330