Evaluating BERT-based scientific relation classifiers for scholarly knowledge graph construction on digital library collections

被引:4
|
作者
Jiang, Ming [1 ]
D'Souza, Jennifer [2 ,3 ]
Auer, Soeren [2 ,3 ]
Downie, J. Stephen [1 ]
机构
[1] Univ Illinois, Urbana, IL 61801 USA
[2] TIB Leibniz Informat Ctr Sci & Technol, Hannover, Germany
[3] Leibniz Univ Hannover, L3S Res Ctr, Hannover, Germany
基金
欧洲研究理事会; 美国国家科学基金会; 欧盟地平线“2020”;
关键词
Digital library; Information extraction; Scholarly text mining; Semantic relation classification; Knowledge graphs; Neural machine learning;
D O I
10.1007/s00799-021-00313-y
中图分类号
G25 [图书馆学、图书馆事业]; G35 [情报学、情报工作];
学科分类号
1205 ; 120501 ;
摘要
The rapid growth of research publications has placed great demands on digital libraries (DL) for advanced information management technologies. To cater to these demands, techniques relying on knowledge-graph structures are being advocated. In such graph-based pipelines, inferring semantic relations between related scientific concepts is a crucial step. Recently, BERT-based pre-trained models have been popularly explored for automatic relation classification. Despite significant progress, most of them were evaluated in different scenarios, which limits their comparability. Furthermore, existing methods are primarily evaluated on clean texts, which ignores the digitization context of early scholarly publications in terms of machine scanning and optical character recognition (OCR). In such cases, the texts may contain OCR noise, in turn creating uncertainty about existing classifiers' performances. To address these limitations, we started by creating OCR-noisy texts based on three clean corpora. Given these parallel corpora, we conducted a thorough empirical evaluation of eight Bert-based classification models by focusing on three factors: (1) Bert variants; (2) classification strategies; and, (3) OCR noise impacts. Experiments on clean data show that the domain-specific pre-trained Bert is the best variant to identify scientific relations. The strategy of predicting a single relation each time outperforms the one simultaneously identifying multiple relations in general. The optimal classifier's performance can decline by around 10% to 20% in F-score on the noisy corpora. Insights discussed in this study can help DL stakeholders select techniques for building optimal knowledge-graph-based systems.
引用
收藏
页码:197 / 215
页数:19
相关论文
共 20 条
  • [11] Knowledge enhanced graph inference network based entity-relation extraction and knowledge graph construction for industrial domain
    Zhulin Han
    Jian Wang
    [J]. Frontiers of Engineering Management, 2024, 11 : 143 - 158
  • [12] The Construction of Digital Tax Collection Model Based on Knowledge Graph Empowered by IoT Technology
    Wu Y.
    [J]. Applied Mathematics and Nonlinear Sciences, 2024, 9 (01)
  • [13] Construction of the Evaluation Index System of Physical Education Teaching in Colleges and Universities Based on Scientific Knowledge Graph
    Wang, Chang
    Xu, Shuangshuang
    [J]. MOBILE INFORMATION SYSTEMS, 2022, 2022
  • [14] Deep learning-based relation extraction and knowledge graph-based representation of construction safety requirements
    Wang, Xiyu
    El-Gohary, Nora
    [J]. AUTOMATION IN CONSTRUCTION, 2023, 147
  • [15] "The Canton Canon" Digital Library Based on Knowledge Graph - Taking the Revolutionary Archives of Canton in the Republic of China as an Example
    Wu, Junchao
    Jiang, Ying
    Chen, Xin
    Guo, Lingyu
    Wei, Xiaotong
    Yang, Xiaoyan
    [J]. 2021 10TH INTERNATIONAL CONFERENCE ON EDUCATIONAL AND INFORMATION TECHNOLOGY (ICEIT 2021), 2021, : 171 - 179
  • [16] Construction and application of a knowledge graph-based question answering system for Nanjing Yunjin digital resources
    Xu, Liang
    Lu, Lu
    Liu, Minglu
    [J]. HERITAGE SCIENCE, 2023, 11 (01)
  • [17] Construction and application of a knowledge graph-based question answering system for Nanjing Yunjin digital resources
    Liang Xu
    Lu Lu
    Minglu Liu
    [J]. Heritage Science, 11
  • [18] Knowledge mining and graph visualization of ancient Chinese scientific and technological documents bibliographic summaries based on digital humanities
    Zheng, Xiang
    Li, Mingjie
    Wan, Ze
    Zhang, Yan
    [J]. LIBRARY HI TECH, 2023,
  • [19] Reinforcement learning-based distant supervision relation extraction for fault diagnosis knowledge graph construction under industry 4.0
    Chen, Chong
    Wang, Tao
    Zheng, Yu
    Liu, Ying
    Xie, Haojia
    Deng, Jianfeng
    Cheng, Lianglun
    [J]. ADVANCED ENGINEERING INFORMATICS, 2023, 55
  • [20] CPBA-CLIM: An entity-relation extraction model for ontology-based knowledge graph construction in hazardous chemical incident management
    Du, Wanru
    Wang, Xiaoyin
    Zhu, Quan
    Jing, Xiaochuan
    Liu, Xuan
    [J]. SCIENCE PROGRESS, 2024, 107 (01)