Building a Sentiment Corpus of Tweets in Brazilian Portuguese

被引:0
|
作者
Brum, Henrico Bertini [1 ]
Volpe Nunes, Maria das Gracas [1 ]
机构
[1] Univ Sao Paulo, Interinst Ctr Computat Linguist NILC, Inst Math & Comp Sci, Sao Paulo, Brazil
关键词
Sentiment Analysis; Corpus Annotation; Social Media;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The large amount of data available in social media, forums and websites motivates researches in several areas of Natural Language Processing, such as sentiment analysis. The popularity of the area due to its subjective and semantic characteristics motivates research on novel methods and approaches for classification. Hence, there is a high demand for datasets on different domains and different languages. This paper introduces TweetSentBR, a sentiment corpus for Brazilian Portuguese manually annotated with 15:000 sentences on TV show domain. The sentences were labeled in three classes (positive, neutral and negative) by seven annotators, following literature guidelines for ensuring reliability on the annotation. We also ran baseline experiments on polarity classification using six machine learning classifiers, reaching 80:38% on F-Measure in binary classification and 64:87% when including the neutral class. We also performed experiments in similar datasets for polarity classification task in comparison to this corpus.
引用
收藏
页码:4167 / 4172
页数:6
相关论文
共 50 条
  • [1] Annotation of a Corpus of Tweets for Sentiment Analysis
    dos Santos, Allisfrank
    Barros Junior, Jorge Daniel
    Camargo, Heloisa de Arruda
    [J]. COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROPOR 2018, 2018, 11122 : 294 - 302
  • [2] Studying the Effects of Text Preprocessing and Ensemble Methods on Sentiment Analysis of Brazilian Portuguese Tweets
    Gomes, Fernando Barbosa
    Adan-Coello, Juan Manuel
    Kintschner, Fernando Ernesto
    [J]. STATISTICAL LANGUAGE AND SPEECH PROCESSING, SLSP 2018, 2018, 11171 : 167 - 177
  • [3] Piegas: A System for Sentiment Analysis of Tweets in Portuguese
    Grandin, P.
    Adan, J. M.
    [J]. IEEE LATIN AMERICA TRANSACTIONS, 2016, 14 (07) : 3467 - 3473
  • [4] AraCust: a Saudi Telecom Tweets corpus for sentiment analysis
    Almuqren, Latifah
    Cristea, Alexandra
    [J]. PeerJ Computer Science, 2021, 7 : 1 - 30
  • [5] AraCust: a Saudi Telecom Tweets corpus for sentiment analysis
    Almuqren, Latifah
    Cristea, Alexandra
    [J]. PEERJ COMPUTER SCIENCE, 2021,
  • [6] TTS-Portuguese Corpus: a corpus for speech synthesis in Brazilian Portuguese
    Casanova, Edresson
    Junior, Arnaldo Candido
    Shulby, Christopher
    de Oliveira, Frederico Santos
    Teixeira, Joao Paulo
    Ponti, Moacir Antonelli
    Aluisio, Sandra
    [J]. LANGUAGE RESOURCES AND EVALUATION, 2022, 56 (03) : 1043 - 1055
  • [7] TTS-Portuguese Corpus: a corpus for speech synthesis in Brazilian Portuguese
    Edresson Casanova
    Arnaldo Candido Junior
    Christopher Shulby
    Frederico Santos de Oliveira
    João Paulo Teixeira
    Moacir Antonelli Ponti
    Sandra Aluísio
    [J]. Language Resources and Evaluation, 2022, 56 : 1043 - 1055
  • [8] brWaC: A WaCky Corpus for Brazilian Portuguese
    Boos, Rodrigo
    Prestes, Kassius
    Villavicencio, Aline
    Padro, Muntsa
    [J]. COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, 2014, 8775 : 201 - 206
  • [9] Building Corpus with Emoticons for Sentiment Analysis
    Li, Changliang
    Wang, Yongguan
    Li, Changsong
    Qi, Ji
    Liu, Pengyuan
    [J]. NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2018, PT II, 2018, 11109 : 309 - 318
  • [10] Sentiment Analysis on Brazilian Portuguese User Reviews
    Souza, Frederico Dias
    de Oliveira e Souza Filho, Joao Baptista
    [J]. 2021 IEEE LATIN AMERICAN CONFERENCE ON COMPUTATIONAL INTELLIGENCE (LA-CCI), 2021,