A large-scale dataset for korean document-level relation extraction from encyclopedia texts

被引:0
|
作者
Son, Suhyune [1 ]
Lim, Jungwoo [1 ]
Koo, Seonmin [1 ]
Kim, Jinsung [1 ]
Kim, Younghoon [2 ]
Lim, Youngsik [2 ]
Hyun, Dongseok [2 ]
Lim, Heuiseok [1 ]
机构
[1] Korea Univ, Comp Sci & Engn, 1 5-ka,Anam Dong, Seoul 02841, South Korea
[2] NAVER, 5 Jeongjail ro,Buljeong ro, Seongnam 13561, South Korea
基金
新加坡国家研究基金会;
关键词
Natural Language Processing; Information Extraction; Document-level Relation Extraction; Korean Relation Extraction; ENTITY;
D O I
10.1007/s10489-024-05605-9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Document-level relation extraction (RE) aims to predict the relational facts between two given entities from a document. Unlike widespread research on document-level RE in English, Korean document-level RE research is still at the very beginning due to the absence of a dataset. To accelerate the studies, we present TREK (Toward Document-Level Relation Extraction in Korean) dataset constructed from Korean encyclopedia documents written by the domain experts. We provide detailed statistical analyses for our large-scale dataset and human evaluation results suggest the assured quality of TREK . Also, we introduce the document-level RE model that considers the named entity-type while considering the Korean language's properties. In the experiments, we demonstrate that our proposed model outperforms the baselines and conduct qualitative analysis.
引用
收藏
页码:8681 / 8701
页数:21
相关论文
共 50 条
  • [1] DocRED: A Large-Scale Document-Level Relation Extraction Dataset
    Yao, Yuan
    Ye, Deming
    Li, Peng
    Han, Xu
    Lin, Yankai
    Liu, Zhenghao
    Liu, Zhiyuan
    Huang, Lixin
    Zhou, Jie
    Sun, Maosong
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 764 - 777
  • [2] DuEE-Fin: A Large-Scale Dataset for Document-Level Event Extraction
    Han, Cuiyun
    Zhang, Jinchuan
    Li, Xinyu
    Xu, Guojin
    Peng, Weihua
    Zeng, Zengfeng
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2022, PT I, 2022, 13551 : 172 - 183
  • [3] HistRED: A Historical Document-Level Relation Extraction Dataset
    Yang, Soyoung
    Choi, Minseok
    Cho, Youngwoo
    Choo, Jaegul
    arXiv, 2023,
  • [4] DOCNLI: A Large-scale Dataset for Document-level Natural Language Inference
    Yin, Wenpeng
    Radev, Dragomir
    Xiong, Caiming
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 4913 - 4922
  • [5] HistRED: A Historical Document-Level Relation Extraction Dataset
    Yang, Soyoung
    Choi, Minseok
    Cho, Youngwoo
    Choo, Jaegul
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 3207 - 3224
  • [6] Survey on Document-Level Relation Extraction
    Zhou Y.
    Huang H.
    Liu H.
    Hao Z.
    Huanan Ligong Daxue Xuebao/Journal of South China University of Technology (Natural Science), 2022, 50 (04): : 10 - 25
  • [7] Document-Level Relation Extraction with Reconstruction
    Xu, Wang
    Chen, Kehai
    Zhao, Tiejun
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 14167 - 14175
  • [8] AutoRE: Document-Level Relation Extraction with Large Language Models
    Xue, Lilong
    Zhang, Dan
    Dong, Yuxiao
    Tang, Jie
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 3: SYSTEM DEMONSTRATIONS, 2024, : 211 - 220
  • [9] Document-level Relation Extraction with Relation Correlations
    Han, Ridong
    Peng, Tao
    Wang, Benyou
    Liu, Lu
    Tiwari, Prayag
    Wan, Xiang
    NEURAL NETWORKS, 2024, 171 : 14 - 24
  • [10] DocEE: A Large-Scale and Fine-grained Benchmark for Document-level Event Extraction
    Tong, Meihan
    Xu, Bin
    Wang, Shuai
    Han, Meihuan
    Cao, Yixin
    Zhu, Jiangqi
    Chen, Siyu
    Hou, Lei
    Li, Juanzi
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 3970 - 3982