TwitterBERT: Framework for Twitter Sentiment Analysis Based on Pre-trained Language Model Representations

被引：22

作者：

Azzouza, Noureddine ^{[1
]}

Akli-Astouati, Karima ^{[1
]}

Ibrahim, Roliana ^{[2
]}

机构：

[1] Univ Sci & Technol Houari Boumediene, FEI Dept Comp Sci, RIIMA Lab, Algiers, Algeria

[2] Univ Teknol Malaysia UTM, Fac Engn, Sch Comp, Johor Baharu 81310, Johor, Malaysia

来源：

EMERGING TRENDS IN INTELLIGENT COMPUTING AND INFORMATICS: DATA SCIENCE, INTELLIGENT INFORMATION SYSTEMS AND SMART COMPUTING | 2020年 / 1073卷

关键词：

Twitter Sentiment Analysis; Word embedding; CNN; LSTM; BERT;

D O I：

10.1007/978-3-030-33582-3_41

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Sentiment analysis has been a topic of discussion in the exploration domain of language understanding. Yet, the neural networks deployed in it are deficient to some extent. Currently, the majority of the studies proceeds on identifying the sentiments by focusing on vocabulary and syntax. Moreover, the task is recognised in Natural Language Processing (NLP) and, for calculating the noteworthy and exceptional results, Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) have been employed. In this study, we propose a four-phase framework for Twitter Sentiment Analysis. This setup is based on the pre-trained Bidirectional Encoder Representations from Transformers (BERT) model as an encoder for generating sentence depictions. For more effective utilisation of this model, we deploy various classification models. Additionally, we concatenate pre-trained representations of word embeddings with BERT representation method to enhance sentiment classification. Experimental results show better implementation when it is evaluated against the baseline framework on all datasets. For example, our best model attains an F1-score of 71.82% on the SemEval 2017 dataset. A comparative analysis on experimental results offers some recommendations on choosing pretraining steps to obtain improved results. The outcomes of the experiment confirm the effectiveness of our system.

引用

页码：428 / 437

页数：10

共 50 条

[1] Aspect Based Sentiment Analysis by Pre-trained Language Representations
Liang Tianxin
Yang Xiaoping
Zhou Xibo
Wang Bingqian
[J]. 2019 IEEE INTL CONF ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, BIG DATA & CLOUD COMPUTING, SUSTAINABLE COMPUTING & COMMUNICATIONS, SOCIAL COMPUTING & NETWORKING (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2019), 2019, : 1262 - 1265
[2] AraXLNet: pre-trained language model for sentiment analysis of Arabic
Alduailej, Alhanouf
Alothaim, Abdulrahman
[J]. JOURNAL OF BIG DATA, 2022, 9 (01)
[3] Leveraging Pre-trained Language Model for Speech Sentiment Analysis
Shon, Suwon
Brusco, Pablo
Pan, Jing
Han, Kyu J.
Watanabe, Shinji
[J]. INTERSPEECH 2021, 2021, : 3420 - 3424
[4] AraXLNet: pre-trained language model for sentiment analysis of Arabic
Alhanouf Alduailej
Abdulrahman Alothaim
[J]. Journal of Big Data, 9
[5] Spanish Pre-Trained CaTrBETO Model for Sentiment Classification in Twitter
Pijal, Washington
Armijos, Arianna
Llumiquinga, Jose
Lalvay, Sebastian
Allauca, Steven
Cuenca, Erick
[J]. 2022 THIRD INTERNATIONAL CONFERENCE ON INFORMATION SYSTEMS AND SOFTWARE TECHNOLOGIES, ICI2ST, 2022, : 93 - 98
[6] Pre-trained Language Model Representations for Language Generation
Edunov, Sergey
Baevski, Alexei
Auli, Michael
[J]. 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 4052 - 4059
[7] An Enhanced Sentiment Analysis Framework Based on Pre-Trained Word Embedding
Mohamed, Ensaf Hussein
Moussa, Mohammed ElSaid
Haggag, Mohamed Hassan
[J]. INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE AND APPLICATIONS, 2020, 19 (04)
[8] Incorporating Dynamic Semantics into Pre-Trained Language Model for Aspect-based Sentiment Analysis
Zhang, Kai
Zhang, Kun
Zhang, Mengdi
Zhao, Hongke
Liu, Qi
Wu, Wei
Chen, Enhong
[J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 3599 - 3610
[9] An Entity-Level Sentiment Analysis of Financial Text Based on Pre-Trained Language Model
Huang, Zhihong
Fang, Zhijian
[J]. 2020 IEEE 18TH INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS (INDIN), VOL 1, 2020, : 391 - 396
[10] PoliBERTweet: A Pre-trained Language Model for Analyzing Political Content on Twitter
Kawintiranon, Kornraphop
Singh, Lisa
[J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 7360 - 7367

← 1 2 3 4 5 →