Linked-DocRED - Enhancing DocRED with Entity-Linking to Evaluate End-To-End Document-Level Information Extraction Pipelines

被引:1
|
作者
Genest, Pierre-Yves [1 ,2 ]
Portier, Pierre-Edouard [2 ]
Egyed-Zsigmond, Elod [2 ]
Lovisetto, Martino [1 ]
机构
[1] Alteca, Lyon, France
[2] Univ Lyon, INSA Lyon, CNRS, UCBL,LIRIS,UMR5205, Villeurbanne, France
关键词
information extraction; document-level relation extraction; entitylinking; dataset;
D O I
10.1145/3539618.3591912
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Information Extraction (IE) pipelines aim to extract meaningful entities and relations from documents and structure them into a knowledge graph that can then be used in downstream applications. Training and evaluating such pipelines requires a dataset annotated with entities, coreferences, relations, and entity-linking. However, existing datasets either lack entity-linking labels, are too small, not diverse enough, or automatically annotated (that is, without a strong guarantee of the correction of annotations). Therefore, we propose Linked-DocRED, to the best of our knowledge, the first manually-annotated, large-scale, document-level IE dataset. We enhance the existing and widely-used DocRED dataset with entity-linking labels that are generated thanks to a semi-automatic process that guarantees high-quality annotations. In particular, we use hyperlinks in Wikipedia articles to provide disambiguation candidates. We also propose a complete framework of metrics to benchmark end-to-end IE pipelines, and we define an entity-centric metric to evaluate entity-linking. The evaluation of a baseline shows promising results while highlighting the challenges of an end-to-end IE pipeline. Linked-DocRED, the source code for the entitylinking, the baseline, and the metrics are distributed under an opensource license and can be downloaded from a public repository.
引用
收藏
页码:3064 / 3074
页数:11
相关论文
共 2 条
  • [1] End-to-end Learning of Logical Rules for Enhancing Document-level Relation Extraction
    Qi, Kunxun
    Du, Jianfeng
    Wan, Hai
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 7247 - 7263
  • [2] Doc2EDAG: An End-to-End Document-level Framework for Chinese Financial Event Extraction
    Zheng, Shun
    Cao, Wei
    Xu, Wei
    Bian, Jiang
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 337 - 346