Assessing the Effectiveness of Multilingual Transformer-based Text Embeddings for Named Entity Recognition in Portuguese

被引:3
|
作者
de Lima Santos, Diego Bernardes [1 ]
de Carvalho Dutra, Frederico Giffoni [2 ]
Parreiras, Fernando Silva [3 ]
Brandao, Wladmir Cardoso [1 ]
机构
[1] Pontifical Catholic Univ Minas Gerais PUC Minas, Dept Comp Sci, Belo Horizonte, MG, Brazil
[2] Co Energet Minas Gerais CEMIG, Belo Horizonte, MG, Brazil
[3] FUMEC Univ, Lab Adv Informat Syst, Belo Horizonte, MG, Brazil
关键词
Named Entity Recognition; Text Embedding; Neural Network; Transformer; Multilingual; Portuguese; MODELS;
D O I
10.5220/0010443204730483
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recent state of the art named entity recognition approaches are based on deep neural networks that use an attention mechanism to learn how to perform the extraction of named entities from relevant fragments of text. Usually, training models in a specific language leads to effective recognition, but it requires a lot of time and computational resources. However, fine-tuning a pre-trained multilingual model can be simpler and faster, but there is a question on how effective that recognition model can be. This article exploits multilingual models for named entity recognition by adapting and training tranformer-based architectures for Portuguese, a challenging complex language. Experimental results show that multilingual trasformer-based text embeddings approaches fine tuned with a large dataset outperforms state of the art trasformer-based models trained specifically for Portuguese. In particular, we build a comprehensive dataset from different versions of HAREM to train our multilingual transformer-based text embedding approach, which achieves 88.0% of precision and 87.8% in F1 in named entity recognition for Portuguese, with gains of up to 9.89% of precision and 11.60% in F1 compared to the state of the art single-lingual approach trained specifically for Portuguese.
引用
收藏
页码:473 / 483
页数:11
相关论文
共 50 条
  • [21] Transformer-based approach for symptom recognition and multilingual linking
    Vassileva, Sylvia
    Grazhdanski, Georgi
    Koychev, Ivan
    Boytcheva, Svetla
    DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2024, 2024
  • [22] Chinese named entity recognition based on adaptive transformer
    Yan Yang
    Yin, Guozhe
    2022 5TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND NATURAL LANGUAGE PROCESSING, MLNLP 2022, 2022, : 327 - 331
  • [23] Chinese named entity recognition based on Transformer encoder
    Guo X.-R.
    Luo P.
    Wang W.-L.
    Jilin Daxue Xuebao (Gongxueban)/Journal of Jilin University (Engineering and Technology Edition), 2021, 51 (03): : 989 - 995
  • [24] Development of a Text Classification Framework using Transformer-based Embeddings
    Yeasmin, Sumona
    Afrin, Nazia
    Saif, Kashfia
    Huq, Mohammad Rezwanul
    PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON DATA SCIENCE, TECHNOLOGY AND APPLICATIONS (DATA), 2022, : 74 - 82
  • [25] Statistical Depth for Ranking and Characterizing Transformer-Based Text Embeddings
    Seegmiller, Parker
    Preum, Sarah Masud
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 9600 - 9611
  • [26] TIRec: Transformer-based Invoice Text Recognition
    Chen, Yanlan
    2023 2ND ASIA CONFERENCE ON ALGORITHMS, COMPUTING AND MACHINE LEARNING, CACML 2023, 2023, : 175 - 180
  • [27] CSECU-DSG at SemEval-2022 Task 11: Identifying the Multilingual Complex Named Entity in Text Using Stacked Embeddings and Transformer based Approach
    Aziz, Abdul
    Hossain, Md. Akram
    Chy, Abu Nowshed
    PROCEEDINGS OF THE 16TH INTERNATIONAL WORKSHOP ON SEMANTIC EVALUATION, SEMEVAL-2022, 2022, : 1549 - 1555
  • [28] A Transformer-Based Framework for Scene Text Recognition
    Selvam, Prabu
    Koilraj, Joseph Abraham Sundar
    Tavera Romero, Carlos Andres
    Alharbi, Meshal
    Mehbodniya, Abolfazl
    Webber, Julian L.
    Sengan, Sudhakar
    IEEE ACCESS, 2022, 10 : 100895 - 100910
  • [29] Poincare Embeddings in the Task of Named Entity Recognition
    Munoz, David
    Perez, Fernando
    Pinto, David
    ADVANCES IN COMPUTATIONAL INTELLIGENCE, MICAI 2020, PT II, 2020, 12469 : 193 - 204
  • [30] Pooled Contextualized Embeddings for Named Entity Recognition
    Akbik, Alan
    Bergmann, Tanja
    Vollgraf, Roland
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 724 - 728