BERT-Based Sentiment Analysis for Low-Resourced Languages: A Case Study of Urdu Language

被引:2
|
作者
Ashraf, Muhammad Rehan [1 ,2 ]
Jana, Yasmeen [2 ]
Umer, Qasim [2 ,3 ]
Jaffar, M. Arfan [1 ]
Chung, Sungwook [4 ]
Ramay, Waheed Yousuf [5 ]
机构
[1] Super Univ, Dept Comp Sci, Lahore 54000, Pakistan
[2] COMSATS Univ Islamabad, Dept Comp Sci, Vehari 61000, Pakistan
[3] Hanyang Univ, Dept Comp Sci, Seoul 04763, South Korea
[4] Changwon Natl Univ, Dept Comp Engn, Chang Won 51140, South Korea
[5] Air Univ, Dept Comp Sci, Multan 60000, Pakistan
来源
IEEE ACCESS | 2023年 / 11卷
关键词
Sentiment analysis; Support vector machines; Social networking (online); Sports; Blogs; Encoding; Natural language processing; Linguistics; Urdu; BERT; classification; sentiment analysis; ROMAN URDU; CLASSIFICATION; MACHINE;
D O I
10.1109/ACCESS.2023.3322101
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Sentiment analysis holds significant importance in research projects by providing valuable insights into public opinions. However, the majority of sentiment analysis studies focus on the English language, leaving a gap in research for other low-resourced languages or regional languages, e.g., Persian, Pashto, and Urdu. Moreover, computational linguists face the challenge of developing lexical resources for these languages. In light of this, this paper presents a deep learning-based approach for Urdu Text Sentiment Analysis (USA-BERT), leveraging Bidirectional Encoder Representations from Transformers and introduces an Urdu Dataset for Sentiment Analysis-23 (UDSA-23). USA-BERT first preprocesses the Urdu reviews by exploiting BERT-Tokenizer. Second, it creates BERT embeddings for each Urdu review. Third, given the BERT embeddings, it fine-tunes a deep learning classifier (BERT). Finally, it employs the Pareto principle on two datasets (the state-of-the-art (UCSA-21) and UDSA-23) to assess USA-BERT. The assessment results demonstrate that USA-BERT significantly surpasses the existing methods by improving the accuracy and f-measure up to 26.09% and 25.87%, respectively.
引用
收藏
页码:110245 / 110259
页数:15
相关论文
共 50 条
  • [21] Sentiment Analysis of Customer Comments in Banking using BERT-based Approaches
    Masarifoglu, Melik
    Tigrak, Umit
    Hakyemez, Sefa
    Gul, Guven
    Bozan, Erdal
    Buyuklu, Ali Hakan
    Ozgur, Arzucan
    [J]. 29TH IEEE CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS (SIU 2021), 2021,
  • [22] A Dataset and BERT-based Models for Targeted Sentiment Analysis on Turkish Texts
    Mutlu, M. Melih
    Ozgur, Arzucan
    [J]. PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022): STUDENT RESEARCH WORKSHOP, 2022, : 467 - 472
  • [23] A BERT-Based Vector Autoregressive Network for Sentiment Analysis of Financial News
    Zhang, Dian
    Wang, Jiening
    Li, Zhaoying
    Liu, Runnan
    Zheng, Wen
    [J]. Dianzi Keji Daxue Xuebao/Journal of the University of Electronic Science and Technology of China, 2023, 52 (02): : 263 - 270
  • [24] Multilingual broad phoneme recognition and language-independent spoken term detection for low-resourced languages
    Deekshitha, G.
    Mary, Leena
    [J]. JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2022, 34 (09) : 7313 - 7323
  • [25] Lexicon-based Sentiment Analysis for Urdu Language
    Ul Rehman, Zia
    Bajwa, Imran Sarwar
    [J]. 2016 SIXTH INTERNATIONAL CONFERENCE ON INNOVATIVE COMPUTING TECHNOLOGY (INTECH), 2016, : 497 - 501
  • [26] BERT-based Approach to Arabic Hate Speech and Offensive Language Detection in Twitter: Exploiting Emojis and Sentiment Analysis
    Althobaiti, Maha Jarallah
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (05) : 972 - 980
  • [27] Using Masked Language Modeling to Enhance BERT-Based Aspect-Based Sentiment Analysis for Affective Token Prediction
    Jin, Weiqiang
    Zhao, Biao
    Liu, Chenxing
    Zhang, Heng
    Jiang, Mengying
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PART X, 2023, 14263 : 530 - 542
  • [28] BERT-based combination of convolutional and recurrent neural network for indonesian sentiment analysis
    Murfi, Hendri
    Syamsyuriani
    Gowandi, Theresia
    Ardaneswari, Gianinna
    Nurrohmah, Siti
    [J]. APPLIED SOFT COMPUTING, 2024, 151
  • [29] Multilingual evaluation of pre-processing for BERT-based sentiment analysis of tweets
    Pota, Marco
    Ventura, Mirko
    Fujita, Hamido
    Esposito, Massimo
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2021, 181
  • [30] Discovering a tourism destination with social media data: BERT-based sentiment analysis
    Santiago Vinan-Ludena, Marlon
    de Campos, Luis M.
    [J]. JOURNAL OF HOSPITALITY AND TOURISM TECHNOLOGY, 2022, 13 (05) : 907 - 921