Assessing the Effectiveness of Multilingual Transformer-based Text Embeddings for Named Entity Recognition in Portuguese

被引:3
|
作者
de Lima Santos, Diego Bernardes [1 ]
de Carvalho Dutra, Frederico Giffoni [2 ]
Parreiras, Fernando Silva [3 ]
Brandao, Wladmir Cardoso [1 ]
机构
[1] Pontifical Catholic Univ Minas Gerais PUC Minas, Dept Comp Sci, Belo Horizonte, MG, Brazil
[2] Co Energet Minas Gerais CEMIG, Belo Horizonte, MG, Brazil
[3] FUMEC Univ, Lab Adv Informat Syst, Belo Horizonte, MG, Brazil
关键词
Named Entity Recognition; Text Embedding; Neural Network; Transformer; Multilingual; Portuguese; MODELS;
D O I
10.5220/0010443204730483
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recent state of the art named entity recognition approaches are based on deep neural networks that use an attention mechanism to learn how to perform the extraction of named entities from relevant fragments of text. Usually, training models in a specific language leads to effective recognition, but it requires a lot of time and computational resources. However, fine-tuning a pre-trained multilingual model can be simpler and faster, but there is a question on how effective that recognition model can be. This article exploits multilingual models for named entity recognition by adapting and training tranformer-based architectures for Portuguese, a challenging complex language. Experimental results show that multilingual trasformer-based text embeddings approaches fine tuned with a large dataset outperforms state of the art trasformer-based models trained specifically for Portuguese. In particular, we build a comprehensive dataset from different versions of HAREM to train our multilingual transformer-based text embedding approach, which achieves 88.0% of precision and 87.8% in F1 in named entity recognition for Portuguese, with gains of up to 9.89% of precision and 11.60% in F1 compared to the state of the art single-lingual approach trained specifically for Portuguese.
引用
收藏
页码:473 / 483
页数:11
相关论文
共 50 条
  • [41] Adaptive, multilingual named entity recognition in Web pages
    Petasis, G
    Karkaletsis, V
    Grover, C
    Hachey, B
    Pazienza, MT
    Vindigni, M
    Coch, J
    ECAI 2004: 16TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2004, 110 : 1073 - 1074
  • [42] Named Entity Recognition Only from Word Embeddings
    Luo, Ying
    Zhao, Hai
    Zhan, Junlang
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 8995 - 9005
  • [43] Exploiting Multiple Embeddings for Chinese Named Entity Recognition
    Xu, Canwen
    Wang, Feiyang
    Han, Jialong
    Li, Chenliang
    PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM '19), 2019, : 2269 - 2272
  • [44] Named Entity Recognition of Chinese Text Based on Attention Mechanism
    Shen, Tong-Ping
    Dumlao, Menchita
    Meng, Qing-Quan
    Zhan, Zhong-Hua
    Journal of Network Intelligence, 2023, 8 (02): : 505 - 518
  • [45] Learning multilingual named entity recognition from Wikipedia
    Nothman, Joel
    Ringland, Nicky
    Radford, Will
    Murphy, Tara
    Curran, James R.
    ARTIFICIAL INTELLIGENCE, 2013, 194 : 151 - 175
  • [46] Multilingual Fine-Grained Named Entity Recognition
    Lupancu, Viorica-Camelia
    Iftene, Adrian
    COMPUTER SCIENCE JOURNAL OF MOLDOVA, 2023, 31 (03) : 321 - 339
  • [47] Persian Automatic Text Summarization Based on Named Entity Recognition
    Khademi, Mohammad Ebrahim
    Fakhredanesh, Mohammad
    IRANIAN JOURNAL OF SCIENCE AND TECHNOLOGY-TRANSACTIONS OF ELECTRICAL ENGINEERING, 2020,
  • [48] Product named entity recognition in Chinese text
    Jun Zhao
    Feifan Liu
    Language Resources and Evaluation, 2008, 42 : 197 - 217
  • [49] Transformer-based Named Entity Recognition for Clinical Cancer Drug Toxicity by Positive-unlabeled Learning and KL Regularizers
    Xie, Weixin
    Xu, Jiayu
    Zhao, Chengkui
    Li, Jin
    Han, Shuangze
    Shao, Tianyu
    Wang, Limei
    Feng, Weixing
    CURRENT BIOINFORMATICS, 2024, 19 (08) : 738 - 751
  • [50] Named entity recognition and classification for text in arabic
    Abuleil, S
    Evens, M
    INTELLIGENT AND ADAPTIVE SYSTEMS AND SOFTWARE ENGINEERING, 2004, : 89 - 94