Assessing the Effectiveness of Multilingual Transformer-based Text Embeddings for Named Entity Recognition in Portuguese

被引:3
|
作者
de Lima Santos, Diego Bernardes [1 ]
de Carvalho Dutra, Frederico Giffoni [2 ]
Parreiras, Fernando Silva [3 ]
Brandao, Wladmir Cardoso [1 ]
机构
[1] Pontifical Catholic Univ Minas Gerais PUC Minas, Dept Comp Sci, Belo Horizonte, MG, Brazil
[2] Co Energet Minas Gerais CEMIG, Belo Horizonte, MG, Brazil
[3] FUMEC Univ, Lab Adv Informat Syst, Belo Horizonte, MG, Brazil
关键词
Named Entity Recognition; Text Embedding; Neural Network; Transformer; Multilingual; Portuguese; MODELS;
D O I
10.5220/0010443204730483
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recent state of the art named entity recognition approaches are based on deep neural networks that use an attention mechanism to learn how to perform the extraction of named entities from relevant fragments of text. Usually, training models in a specific language leads to effective recognition, but it requires a lot of time and computational resources. However, fine-tuning a pre-trained multilingual model can be simpler and faster, but there is a question on how effective that recognition model can be. This article exploits multilingual models for named entity recognition by adapting and training tranformer-based architectures for Portuguese, a challenging complex language. Experimental results show that multilingual trasformer-based text embeddings approaches fine tuned with a large dataset outperforms state of the art trasformer-based models trained specifically for Portuguese. In particular, we build a comprehensive dataset from different versions of HAREM to train our multilingual transformer-based text embedding approach, which achieves 88.0% of precision and 87.8% in F1 in named entity recognition for Portuguese, with gains of up to 9.89% of precision and 11.60% in F1 compared to the state of the art single-lingual approach trained specifically for Portuguese.
引用
收藏
页码:473 / 483
页数:11
相关论文
共 50 条
  • [31] Language Clustering for Multilingual Named Entity Recognition
    Shaffer, Kyle
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 40 - 45
  • [32] Assessing the Syntactic Capabilities of Transformer-based Multilingual Language Models
    Perez-Mayos, Laura
    Taboas Garcia, Alba
    Mille, Simon
    Wanner, Leo
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 3799 - 3812
  • [33] Named Entity Recognition and Relation Extraction for COVID-19: Explainable Active Learning with Word2vec Embeddings and Transformer-Based BERT Models
    Arguello-Casteleiro, M.
    Maroto, N.
    Wroe, C.
    Torrado, C. Sevillano
    Henson, C.
    Des-Diz, J.
    Fernandez-Prieto, M. J.
    Furmston, T.
    Fernandez, D. Maseda
    Kulshrestha, M.
    Stevens, R.
    Keane, J.
    Peters, S.
    ARTIFICIAL INTELLIGENCE XXXVIII, 2021, 13101 : 158 - 163
  • [34] Firefly Algorithm Based Multilingual Named Entity Recognition for Indian Languages
    Biswas, Sitanath
    Dash, Sujata
    Acharya, Sweta
    ADVANCED INFORMATICS FOR COMPUTING RESEARCH, ICAICR 2018, PT I, 2019, 955 : 540 - 552
  • [35] T-NER: An All-Round Python']Python Library for Transformer-based Named Entity Recognition
    Ushio, Asahi
    Camacho-Collados, Jose
    EACL 2021: THE 16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: PROCEEDINGS OF THE SYSTEM DEMONSTRATIONS, 2021, : 53 - 62
  • [36] A golden resource for named entity recognition in Portuguese
    Santos, Diana
    Cardoso, Nuno
    COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROCEEDINGS, 2006, 3960 : 69 - 79
  • [37] A Light Transformer-Based Architecture for Handwritten Text Recognition
    Barrere, Killian
    Soullard, Yann
    Lemaitre, Aurelie
    Couasnon, Bertrand
    DOCUMENT ANALYSIS SYSTEMS, DAS 2022, 2022, 13237 : 275 - 290
  • [38] Contributions to Clinical Named Entity Recognition in Portuguese
    Lopes, Fabio
    Teixeira, Cesar
    Oliveira, Hugo Goncalo
    SIGBIOMED WORKSHOP ON BIOMEDICAL NATURAL LANGUAGE PROCESSING (BIONLP 2019), 2019, : 223 - 233
  • [39] Named Entity Recognition: a Survey for the Portuguese Language
    Albuquerque, Hidelberg O.
    Souza, Ellen
    Gomes, Carlos
    Pinto, Matheus Henrique de C.
    Filho, Ricardo P. S.
    Costa, Rosimeire
    Lopes, Vinicius Teixeira de M.
    da Silva, Nadia F. F.
    de Carvalho, Andre C. P. L. F.
    Oliveira, Adriano L. I.
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2023, (70): : 171 - 185
  • [40] Using WordNet Predicates for Multilingual Named Entity Recognition
    Negri, Matteo
    Magnini, Bernardo
    GWC 2004: SECOND INTERNATIONAL WORDNET CONFERENCE, PROCEEDINGS, 2003, : 169 - 174