Sentiment analysis in Portuguese tweets: an evaluation of diverse word representation models

被引:0
|
作者
Vianna, Daniela [1 ,2 ]
Carneiro, Fernando [3 ]
Carvalho, Jonnathan [4 ]
Plastino, Alexandre [3 ]
Paes, Aline [3 ]
机构
[1] Univ Fed Amazonas UFAM, Inst Comp, Manaus, AM, Brazil
[2] Jusbrasil, Salvador, Brazil
[3] Univ Fed Fluminense UFF, Inst Comp, Niteroi, RJ, Brazil
[4] Inst Fed Fluminense IFF, Itaperuna, RJ, Brazil
关键词
Sentiment analysis; Word representation; Brazilian Portuguese tweets; Language models; SOCIAL MEDIA;
D O I
10.1007/s10579-023-09661-4
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
During the past years, we have seen a steady increase in the number of social networks worldwide. Among them, Twitter has consolidated its position as one of the most influential social platforms, with Brazilian Portuguese speakers holding the fifth position in the number of users. Due to the informal linguistic style of tweets, the discovery of information in such an environment poses a challenge to Natural Language Processing (NLP) tasks such as sentiment analysis. In this work, we state sentiment analysis as a binary (positive and negative) and multiclass (positive, negative, and neutral) classification task at the Portuguese-written tweet level. Following a feature extraction approach, embeddings are initially gathered for a tweet and then given as input to learning a classifier. This study was designed to evaluate the effectiveness of different word representations, from the original pre-trained language model to continued pre-training strategies, to improve the predictive performance of sentiment classification, using three different classifier algorithms and eight Portuguese tweets datasets. Because of the lack of a language model specific to Brazilian Portuguese tweets, we have expanded our evaluation to consider six different embeddings: fastText, GloVe, Word2Vec, BERT-multilingual (mBERT), BERTweet, and BERTimbau. The experiments showed that embeddings trained from scratch solely using the target Portuguese language, BERTimbau, outperform the static representations, fastText, GloVe, and Word2Vec, and the Transformer-based models BERT multilingual and BERTweet. In addition, we show that extracting the contextualized embedding without any adjustment to the pre-trained language model is the best approach for most datasets.
引用
收藏
页码:223 / 272
页数:50
相关论文
共 50 条
  • [1] Sentiment analysis in Portuguese tweets: an evaluation of diverse word representation models
    Daniela Vianna
    Fernando Carneiro
    Jonnathan Carvalho
    Alexandre Plastino
    Aline Paes
    [J]. Language Resources and Evaluation, 2024, 58 : 223 - 272
  • [2] Sentiment analysis in tweets: an assessment study from classical to modern word representation models
    Sérgio Barreto
    Ricardo Moura
    Jonnathan Carvalho
    Aline Paes
    Alexandre Plastino
    [J]. Data Mining and Knowledge Discovery, 2023, 37 : 318 - 380
  • [3] Sentiment analysis in tweets: an assessment study from classical to modern word representation models
    Barreto, Sergio
    Moura, Ricardo
    Carvalho, Jonnathan
    Paes, Aline
    Plastino, Alexandre
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2023, 37 (01) : 318 - 380
  • [4] Piegas: A System for Sentiment Analysis of Tweets in Portuguese
    Grandin, P.
    Adan, J. M.
    [J]. IEEE LATIN AMERICA TRANSACTIONS, 2016, 14 (07) : 3467 - 3473
  • [5] Sentiment analysis for the tweets that contain the word "earthquake"
    Pirnau, Mironela
    [J]. PROCEEDINGS OF THE 2018 10TH INTERNATIONAL CONFERENCE ON ELECTRONICS, COMPUTERS AND ARTIFICIAL INTELLIGENCE (ECAI), 2018,
  • [6] Building a Sentiment Corpus of Tweets in Brazilian Portuguese
    Brum, Henrico Bertini
    Volpe Nunes, Maria das Gracas
    [J]. PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 4167 - 4172
  • [7] Sentiment Analysis on Tweets
    Khatoon, Mehjabin
    Banu, W. Aisha
    Zohra, A. Ayesha
    Chinthamani, S.
    [J]. SOFTWARE ENGINEERING (CSI 2015), 2019, 731 : 717 - 724
  • [8] Sentiment Analysis Model Based on the Word Structural Representation
    Bekmanova, Gulmira
    Yergesh, Banu
    Sharipbay, Altynbek
    [J]. BRAIN INFORMATICS, BI 2021, 2021, 12960 : 170 - 178
  • [9] Improving Sentiment Analysis in Arabic Using Word Representation
    Alayba, Abdulaziz M.
    Palade, Vasile
    England, Matthew
    Iqbal, Rahat
    [J]. 2018 IEEE 2ND INTERNATIONAL WORKSHOP ON ARABIC AND DERIVED SCRIPT ANALYSIS AND RECOGNITION (ASAR), 2018, : 13 - 18