Targeting precision: A hybrid scientific relation extraction pipeline for improved scholarly knowledge organization

被引:1
|
作者
Jiang M. [1 ]
D'Souza J. [2 ]
Auer S. [2 ]
Downie J.S. [1 ]
机构
[1] University of Illinois at Urbana-Champaign, Champaign, IL
[2] TIB Leibniz Information Centre for Science and Technology and L3S Research Center at Leibniz University of Hannover, Hannover
关键词
information extraction; knowledge graphs; relation extraction; scholarly knowledge organization; scholarly text mining;
D O I
10.1002/pra2.303
中图分类号
学科分类号
摘要
Knowledge graphs have been successfully built from unstructured texts in general domains such as newswire by leveraging distant supervision relation signals from linked data repositories such as DBpedia. In contrast, the lack of a comprehensive ontology of scholarly relations makes it difficult to similarly adopt distant supervision to create knowledge graphs over scholarly articles. In light of this difficulty, we propose a hybrid approach to extract scientific concept relations from scholarly publications by: (a) utilizing syntactic rules as a form of distant supervision to link related scientific term pairs; and (b) training a classifier to further identify the relation type per pair. Our system targets a high-precision performance objective as opposed to high recall, aiming to reduce the noisy results albeit at the cost of extracting fewer relations when building scholarly knowledge graphs over massive-scale publications. Results on two benchmark datasets show that our hybrid system surpasses the state-of-the-art with an overall 60% F1 score led by the nearly 15% precision boost in identifying related scientific concepts. We further achieved an overall F1 in the range 34.1% to 51.2%, on relation classification, per experimental dataset. 83rd Annual Meeting of the Association for Information Science & Technology October 25-29, 2020. Author(s) retain copyright, but ASIS&T receives an exclusive publication license.
引用
收藏
相关论文
共 10 条
  • [1] Dialogue between students previous conceptions and scholarly scientific knowledge: in relation to Amphisbaenias
    Santos Baptista, Geilsa Costa
    Costa Neto, Eraldo Medeiros
    Costa Valverde, Maria Celeste
    [J]. REVISTA IBEROAMERICANA DE EDUCACION, 2008, 47 (02):
  • [2] Towards a Hybrid Human-Computer Scientific Information Extraction Pipeline
    Tchoua, Roselyne B.
    Chard, Kyle
    Audus, Debra J.
    Ward, Logan T.
    Lequieu, Joshua
    de Pablo, Juan J.
    Foster, Ian T.
    [J]. 2017 IEEE 13TH INTERNATIONAL CONFERENCE ON E-SCIENCE (E-SCIENCE), 2017, : 109 - 118
  • [3] Pretrained Knowledge Base Embeddings for improved Sentential Relation Extraction
    Papaluca, Andrea
    Krefl, Daniel
    Suominen, Hanna
    Lenskiy, Artem
    [J]. PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022): STUDENT RESEARCH WORKSHOP, 2022, : 373 - 382
  • [4] Evaluating BERT-based scientific relation classifiers for scholarly knowledge graph construction on digital library collections
    Jiang, Ming
    D'Souza, Jennifer
    Auer, Soeren
    Downie, J. Stephen
    [J]. INTERNATIONAL JOURNAL ON DIGITAL LIBRARIES, 2022, 23 (02) : 197 - 215
  • [5] Evaluating BERT-based scientific relation classifiers for scholarly knowledge graph construction on digital library collections
    Ming Jiang
    Jennifer D’Souza
    Sören Auer
    J. Stephen Downie
    [J]. International Journal on Digital Libraries, 2022, 23 : 197 - 215
  • [6] Improved distant supervision relation extraction based on edge-reasoning hybrid graph model
    Shen, Shirong
    Duan, Shangfu
    Gao, Huan
    Qi, Guilin
    [J]. JOURNAL OF WEB SEMANTICS, 2021, 70 (70):
  • [7] Entity relation joint extraction method for manufacturing industry knowledge data based on improved BERT algorithm
    Han, Jiao
    Jia, Kang
    [J]. CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2024, 27 (06): : 7941 - 7954
  • [8] Exploiting Functional Discourse Grammar to Enhance Complex Arabic Relation Extraction using a Hybrid Semantic Knowledge Base - Machine Learning Approach
    Osman, Taha
    Khalil, Hussein
    Miltan, Mohammed
    Shaalan, Khaled
    Alfrjani, Rowida
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (08)
  • [10] High precision state of health estimation of lithium-ion batteries based on strong correlation aging feature extraction and improved hybrid kernel function least squares support vector regression machine model
    Feng, Renjun
    Wang, Shunli
    Yu, Chunmei
    Hai, Nan
    Fernandez, Carlos
    [J]. JOURNAL OF ENERGY STORAGE, 2024, 90