REDFM: a Filtered and Multilingual Relation Extraction Dataset

被引:0
|
作者
Cabot, Pere-Lluis Huguet [1 ,2 ]
Tedeschi, Simone [1 ,2 ]
Ngomo, Axel-Cyrille Ngonga [3 ]
Navigli, Roberto [2 ]
机构
[1] Babelscape, Rome, Italy
[2] Sapienza Univ Rome, Rome, Italy
[3] Paderborn Univ, Paderborn, Germany
基金
欧盟地平线“2020”;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Relation Extraction (RE) is a task that identifies relationships between entities in a text, enabling the acquisition of relational facts and bridging the gap between natural language and structured knowledge. However, current RE models often rely on small datasets with low coverage of relation types, particularly when working with languages other than English. In this paper, we address the above issue and provide two new resources that enable the training and evaluation of multilingual RE systems. First, we present SREDFM, an automatically annotated dataset covering 18 languages, 400 relation types, 13 entity types, totaling more than 40 million triplet instances. Second, we propose REDFM, a smaller, human-revised dataset for seven languages that allows for the evaluation of multilingual RE systems. To demonstrate the utility of these novel datasets, we experiment with the first end-to-end multilingual RE model, mREBEL, that extracts triplets, including entity types, in multiple languages. We release our resources and model checkpoints at https://www.github.com/babelscape/rebel.
引用
收藏
页码:4326 / 4343
页数:18
相关论文
共 50 条
  • [1] Multilingual Entity and Relation Extraction Dataset and Model
    Seganti, Alessandro
    Firlag, Klaudia
    Skowronska, Helena
    Satlawa, Michal
    Andruszkiewicz, Piotr
    [J]. 16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 1946 - 1955
  • [2] MultiTACRED: A Multilingual Version of the TAC Relation Extraction Dataset
    Hennig, Leonhard
    Thomas, Philippe
    Moeller, Sebastian
    [J]. PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 3785 - 3801
  • [3] DiS-ReX: A Multilingual Dataset for Distantly Supervised Relation Extraction
    Bhartiya, Abhyuday
    Badola, Kartikeya
    Mausam
    [J]. PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022): (SHORT PAPERS), VOL 2, 2022, : 849 - 863
  • [4] A Dataset for Multilingual Epidemiological Event Extraction
    Mutuvi, Stephen
    Doucet, Antoine
    Lejeune, Gael
    Odeo, Moses
    [J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 4139 - 4144
  • [5] REFinD: Relation Extraction Financial Dataset
    Kaur, Simerjot
    Smiley, Charese
    Gupta, Akshat
    Sain, Joy
    Wang, Dongsheng
    Siddagangappa, Suchetha
    Aguda, Toyin
    Shah, Sameena
    [J]. PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023, 2023, : 3054 - 3063
  • [6] A Multilingual Dataset for Evaluating Parallel Sentence Extraction from Comparable Corpora
    Zweigenbaum, Pierre
    Sharoff, Serge
    Rapp, Reinhard
    [J]. PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 3828 - 3833
  • [7] Multilingual Entity, Relation, Event and Human Value Extraction
    Li, Manling
    Lin, Ying
    Hoover, Joseph
    Whitehead, Spencer
    Voss, Clare R.
    Dehghani, Morteza
    Ji, Heng
    [J]. NAACL HLT 2019: THE 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES: PROCEEDINGS OF THE DEMONSTRATIONS SESSION, 2019, : 110 - 115
  • [8] DReD-A Descriptive Relation Dataset for Expanding Relation Extraction
    Markewich, Logan
    Xing, Yubin
    Lee, Roy Ka-Wei
    Li, Zhi
    Ko, Seokbum
    [J]. IEEE Transactions on Artificial Intelligence, 2023, 4 (06): : 1494 - 1503
  • [9] FinRED: A Dataset for Relation Extraction in Financial Domain
    Sharma, Soumya
    Nayak, Tapas
    Bose, Arusarka
    Meena, Ajay Kumar
    Dasgupta, Koustuv
    Ganguly, Niloy
    Goyal, Pawan
    [J]. COMPANION PROCEEDINGS OF THE WEB CONFERENCE 2022, WWW 2022 COMPANION, 2022, : 595 - 597
  • [10] BioRED: a rich biomedical relation extraction dataset
    Luo, Ling
    Lai, Po-Ting
    Wei, Chih-Hsuan
    Arighi, Cecilia N.
    Lu, Zhiyong
    [J]. BRIEFINGS IN BIOINFORMATICS, 2022, 23 (05)