Challenges and Advances in Information Extraction from Scientific Literature: a Review

被引:21
|
作者
Hong, Zhi [1 ]
Ward, Logan [2 ]
Chard, Kyle [1 ,2 ]
Blaiszik, Ben [1 ,2 ]
Foster, Ian [1 ,2 ]
机构
[1] Univ Chicago, Chicago, IL 60637 USA
[2] Argonne Natl Lab, Lemont, IL USA
关键词
Information extraction; Text mining; Scientific data; PROPERTY DATA; RECOGNITION; GENERATION; RECAPTCHA; STANDARD; SYSTEM; WEB;
D O I
10.1007/s11837-021-04902-9
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Scientific articles have long been the primary means of disseminating scientific discoveries. Over the centuries, valuable data and potentially groundbreaking insights have been collected and buried deep in the mountain of publications. In materials engineering, such data are spread across technical handbooks specification sheets, journal articles, and laboratory notebooks in myriad formats. Extracting information from papers on a large scale has been a tedious and time-consuming job to which few researchers have wanted to devote their limited time and effort, yet is an activity that is essential for modern data-driven design practices. However, in recent years, significant progress has been made by the computer science community on techniques for automated information extraction from free text. Yet, transformative application of these techniques to scientific literature remains elusive-due not to a lack of interest or effort but to technical and logistical challenges. Using the challenges in the materials science literature as a driving motivation, we review the gaps between state-of-the-art information extraction methods and the practical application of such methods to scientific texts, and offer a comprehensive overview of work that can be undertaken to close these gaps.
引用
收藏
页码:3383 / 3400
页数:18
相关论文
共 50 条
  • [1] Challenges and Advances in Information Extraction from Scientific Literature: a Review
    Zhi Hong
    Logan Ward
    Kyle Chard
    Ben Blaiszik
    Ian Foster
    [J]. JOM, 2021, 73 : 3383 - 3400
  • [2] AutoIE: An Automated Framework for Information Extraction from Scientific Literature
    Liu, Yangyang
    Li, Shoubin
    Huang, Kai
    Wang, Qing
    [J]. KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT II, KSEM 2024, 2024, 14885 : 424 - 436
  • [3] Methodological Challenges for the Comparison of Results of Topic Extraction from Scientific Literature
    Velden, Theresa
    [J]. 16TH INTERNATIONAL CONFERENCE ON SCIENTOMETRICS & INFORMETRICS (ISSI 2017), 2017, : 1558 - 1568
  • [4] Biological network extraction from scientific literature: state of the art and challenges
    Li, Chen
    Liakata, Maria
    Rebholz-Schuhmann, Dietrich
    [J]. BRIEFINGS IN BIOINFORMATICS, 2014, 15 (05) : 856 - 877
  • [5] Review of Knowledge Extraction of Scientific Literature
    Xu, Hongxia
    Li, Chunwang
    [J]. Data Analysis and Knowledge Discovery, 2019, 3 (03) : 14 - 24
  • [6] ChemDataExtractor: A Toolkit for Automated Extraction of Chemical Information from the Scientific Literature
    Swain, Matthew C.
    Cole, Jacqueline M.
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2016, 56 (10) : 1894 - 1904
  • [7] Systematic Literature Review of Information Extraction From Textual Data: Recent Methods, Applications, Trends, and Challenges
    Abdullah, Mohd Hafizul Afifi
    Aziz, Norshakirah
    Abdulkadir, Said Jadid
    Alhussian, Hitham Seddig Alhassan
    Talpur, Noureen
    [J]. IEEE ACCESS, 2023, 11 : 10535 - 10562
  • [8] Advances and challenges of SUS in three decades of progress: Integrative review from literature
    Nicola, Lucas Vedovato
    Garcia Alves, Carolina Rezende
    Bertolin, Daniela Comelis
    [J]. JOURNAL OF CLINICAL HYPERTENSION, 2020, 22 (04): : 691 - 691
  • [9] Automatic Metadata Information Extraction from Scientific Literature using Deep Neural Networks
    Yang, Huichen
    Hsu, William
    [J]. FOURTEENTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2021), 2022, 12084
  • [10] Clinical information extraction applications: A literature review
    Wang, Yanshan
    Wang, Liwei
    Rastegar-Mojarad, Majid
    Moon, Sungrim
    Shen, Feichen
    Afzal, Naveed
    Liu, Sijia
    Zeng, Yuqun
    Mehrabi, Saeed
    Sohn, Sunghwan
    Liu, Hongfang
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2018, 77 : 34 - 49