Multilingual Entity and Relation Extraction Dataset and Model

被引:0
|
作者
Seganti, Alessandro [1 ,2 ]
Firlag, Klaudia [1 ]
Skowronska, Helena [1 ,3 ]
Satlawa, Michal [1 ]
Andruszkiewicz, Piotr [1 ,4 ]
机构
[1] Samsung R&D Inst Poland, Warsaw, Poland
[2] Equinix, Redwood City, CA 94065 USA
[3] NextSell, ODC Grp, Warsaw, Poland
[4] Warsaw Univ Technol, Warsaw, Poland
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a novel dataset and model for a multilingual setting to approach the task of Joint Entity and Relation Extraction. The SMi-LER dataset consists of 1.1 M annotated sentences, representing 36 relations, and 14 languages. To the best of our knowledge, this is currently both the largest and the most comprehensive dataset of this type. We introduce HERBERTa, a pipeline that combines two independent BERT models: one for sequence classification, and the other for entity tagging. The model achieves micro F-1 81.49 for English on this dataset, which is close to the current SOTA on CoNLL, SpERT.
引用
收藏
页码:1946 / 1955
页数:10
相关论文
共 50 条
  • [1] REDFM: a Filtered and Multilingual Relation Extraction Dataset
    Cabot, Pere-Lluis Huguet
    Tedeschi, Simone
    Ngomo, Axel-Cyrille Ngonga
    Navigli, Roberto
    [J]. PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 4326 - 4343
  • [2] Multilingual Entity, Relation, Event and Human Value Extraction
    Li, Manling
    Lin, Ying
    Hoover, Joseph
    Whitehead, Spencer
    Voss, Clare R.
    Dehghani, Morteza
    Ji, Heng
    [J]. NAACL HLT 2019: THE 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES: PROCEEDINGS OF THE DEMONSTRATIONS SESSION, 2019, : 110 - 115
  • [3] MultiTACRED: A Multilingual Version of the TAC Relation Extraction Dataset
    Hennig, Leonhard
    Thomas, Philippe
    Moeller, Sebastian
    [J]. PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 3785 - 3801
  • [4] DiS-ReX: A Multilingual Dataset for Distantly Supervised Relation Extraction
    Bhartiya, Abhyuday
    Badola, Kartikeya
    Mausam
    [J]. PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022): (SHORT PAPERS), VOL 2, 2022, : 849 - 863
  • [5] VoxEL: A Benchmark Dataset for Multilingual Entity Linking
    Rosales-Mendez, Henry
    Hogan, Aidan
    Poblete, Barbara
    [J]. SEMANTIC WEB - ISWC 2018, PT II, 2018, 11137 : 170 - 186
  • [6] Lightweight Multilingual Entity Extraction and Linking
    Pappu, Aasish
    Blanco, Roi
    Mehdad, Yashar
    Stent, Amanda
    Thadani, Kapil
    [J]. WSDM'17: PROCEEDINGS OF THE TENTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, 2017, : 365 - 374
  • [7] A Dataset for Multilingual Epidemiological Event Extraction
    Mutuvi, Stephen
    Doucet, Antoine
    Lejeune, Gael
    Odeo, Moses
    [J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 4139 - 4144
  • [8] Fast Model for Joint Extraction of Entity and Relation
    Yang, Dong
    Tian, Shengwei
    Yu, Long
    Zhou, Tiejun
    Wang, Bo
    [J]. Computer Engineering and Applications, 2023, 59 (13) : 164 - 170
  • [9] A marker collaborating model for entity and relation extraction
    Wu, Yizhao
    Chen, Yanping
    Qin, Yongbin
    Huang, Ruizhang
    Tang, Ruixue
    [J]. JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2022, 34 (10) : 9163 - 9172
  • [10] MORE: A Multimodal Object-Entity Relation Extraction Dataset with a Benchmark Evaluation
    He, Liang
    Wang, Hongke
    Cao, Yongchang
    Wu, Zhen
    Zhang, Jianbing
    Dai, Xinyu
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 4564 - 4573