Evaluating BERT-based scientific relation classifiers for scholarly knowledge graph construction on digital library collections

被引:4
|
作者
Jiang, Ming [1 ]
D'Souza, Jennifer [2 ,3 ]
Auer, Soeren [2 ,3 ]
Downie, J. Stephen [1 ]
机构
[1] Univ Illinois, Urbana, IL 61801 USA
[2] TIB Leibniz Informat Ctr Sci & Technol, Hannover, Germany
[3] Leibniz Univ Hannover, L3S Res Ctr, Hannover, Germany
基金
欧盟地平线“2020”; 美国国家科学基金会; 欧洲研究理事会;
关键词
Digital library; Information extraction; Scholarly text mining; Semantic relation classification; Knowledge graphs; Neural machine learning;
D O I
10.1007/s00799-021-00313-y
中图分类号
G25 [图书馆学、图书馆事业]; G35 [情报学、情报工作];
学科分类号
1205 ; 120501 ;
摘要
The rapid growth of research publications has placed great demands on digital libraries (DL) for advanced information management technologies. To cater to these demands, techniques relying on knowledge-graph structures are being advocated. In such graph-based pipelines, inferring semantic relations between related scientific concepts is a crucial step. Recently, BERT-based pre-trained models have been popularly explored for automatic relation classification. Despite significant progress, most of them were evaluated in different scenarios, which limits their comparability. Furthermore, existing methods are primarily evaluated on clean texts, which ignores the digitization context of early scholarly publications in terms of machine scanning and optical character recognition (OCR). In such cases, the texts may contain OCR noise, in turn creating uncertainty about existing classifiers' performances. To address these limitations, we started by creating OCR-noisy texts based on three clean corpora. Given these parallel corpora, we conducted a thorough empirical evaluation of eight Bert-based classification models by focusing on three factors: (1) Bert variants; (2) classification strategies; and, (3) OCR noise impacts. Experiments on clean data show that the domain-specific pre-trained Bert is the best variant to identify scientific relations. The strategy of predicting a single relation each time outperforms the one simultaneously identifying multiple relations in general. The optimal classifier's performance can decline by around 10% to 20% in F-score on the noisy corpora. Insights discussed in this study can help DL stakeholders select techniques for building optimal knowledge-graph-based systems.
引用
收藏
页码:197 / 215
页数:19
相关论文
共 20 条
  • [1] Evaluating BERT-based scientific relation classifiers for scholarly knowledge graph construction on digital library collections
    Ming Jiang
    Jennifer D’Souza
    Sören Auer
    J. Stephen Downie
    [J]. International Journal on Digital Libraries, 2022, 23 : 197 - 215
  • [2] BERT-INT: A BERT-based Interaction Model For Knowledge Graph Alignment
    Tang, Xiaobin
    Zhang, Jing
    Chen, Bo
    Yang, Yang
    Chen, Hong
    Li, Cuiping
    [J]. PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 3174 - 3180
  • [3] Mining Scholarly Publications for Scientific Knowledge Graph Construction
    Buscaldi, Davide
    Dessi, Danilo
    Motta, Enrico
    Osborne, Francesco
    Recupero, Diego Reforgiato
    [J]. SEMANTIC WEB: ESWC 2019 SATELLITE EVENTS, 2019, 11762 : 8 - 12
  • [4] Evaluating BERT's Encoding of Intrinsic Semantic Features of OCR'd Digital Library Collections
    Jiang, Ming
    Hu, Yuerong
    Worthey, Glen
    Dubnicek, Ryan C.
    Underwood, Ted
    Downie, J. Stephen
    [J]. 2021 ACM/IEEE JOINT CONFERENCE ON DIGITAL LIBRARIES (JCDL 2021), 2021, : 308 - 309
  • [5] A BERT-based Approach with Relation-aware Attention for Knowledge Base Question Answering
    Luo, Da
    Su, Jindian
    Yu, Shanshan
    [J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [6] The Visual Analysis of Digital Library Based on Knowledge Graph
    Li, Xiaoming
    [J]. 2014 2ND INTERNATIONAL CONFERENCE ON ECONOMIC, BUSINESS MANAGEMENT AND EDUCATION INNOVATION (EBMEI 2014), VOL 36, 2014, 36 : 147 - 152
  • [7] Knowledge Management-Based Digital Library Construction
    Dong, Jieping
    Han, Haitao
    [J]. PROCEEDINGS OF THE 2010 INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND SCIENTIFIC MANAGEMENT, VOLS 1-2, 2010, : 523 - +
  • [8] Visual analysis of digital twin development based on scientific knowledge graph
    Liu, Xiaobing
    Wang, Shuting
    Bai, Zhaoyang
    [J]. Jisuanji Jicheng Zhizao Xitong/Computer Integrated Manufacturing Systems, CIMS, 2022, 28 (06): : 1673 - 1684
  • [9] Knowledge Construction Based on Visualization E-learning in Digital Library
    Chen YongYue
    Xia HuoSong
    [J]. FIRST INTERNATIONAL WORKSHOP ON DATABASE TECHNOLOGY AND APPLICATIONS, PROCEEDINGS, 2009, : 14 - 17
  • [10] Knowledge enhanced graph inference network based entity-relation extraction and knowledge graph construction for industrial domain
    Han, Zhulin
    Wang, Jian
    [J]. FRONTIERS OF ENGINEERING MANAGEMENT, 2024, 11 (01) : 143 - 158