BERT-Based Sentiment Analysis for Low-Resourced Languages: A Case Study of Urdu Language

被引:4
|
作者
Ashraf, Muhammad Rehan [1 ,2 ]
Jana, Yasmeen [2 ]
Umer, Qasim [2 ,3 ]
Jaffar, M. Arfan [1 ]
Chung, Sungwook [4 ]
Ramay, Waheed Yousuf [5 ]
机构
[1] Super Univ, Dept Comp Sci, Lahore 54000, Pakistan
[2] COMSATS Univ Islamabad, Dept Comp Sci, Vehari 61000, Pakistan
[3] Hanyang Univ, Dept Comp Sci, Seoul 04763, South Korea
[4] Changwon Natl Univ, Dept Comp Engn, Chang Won 51140, South Korea
[5] Air Univ, Dept Comp Sci, Multan 60000, Pakistan
来源
IEEE ACCESS | 2023年 / 11卷
关键词
Sentiment analysis; Support vector machines; Social networking (online); Sports; Blogs; Encoding; Natural language processing; Linguistics; Urdu; BERT; classification; sentiment analysis; ROMAN URDU; CLASSIFICATION; MACHINE;
D O I
10.1109/ACCESS.2023.3322101
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Sentiment analysis holds significant importance in research projects by providing valuable insights into public opinions. However, the majority of sentiment analysis studies focus on the English language, leaving a gap in research for other low-resourced languages or regional languages, e.g., Persian, Pashto, and Urdu. Moreover, computational linguists face the challenge of developing lexical resources for these languages. In light of this, this paper presents a deep learning-based approach for Urdu Text Sentiment Analysis (USA-BERT), leveraging Bidirectional Encoder Representations from Transformers and introduces an Urdu Dataset for Sentiment Analysis-23 (UDSA-23). USA-BERT first preprocesses the Urdu reviews by exploiting BERT-Tokenizer. Second, it creates BERT embeddings for each Urdu review. Third, given the BERT embeddings, it fine-tunes a deep learning classifier (BERT). Finally, it employs the Pareto principle on two datasets (the state-of-the-art (UCSA-21) and UDSA-23) to assess USA-BERT. The assessment results demonstrate that USA-BERT significantly surpasses the existing methods by improving the accuracy and f-measure up to 26.09% and 25.87%, respectively.
引用
收藏
页码:110245 / 110259
页数:15
相关论文
共 50 条
  • [41] Attention-Based Neural Machine Translation Approach for Low-Resourced Indic Languages-A Case of Sanskrit to Hindi Translation
    Bakarola, Vishvajit
    Nasriwala, Jitendra
    SMART SYSTEMS: INNOVATIONS IN COMPUTING (SSIC 2021), 2022, 235 : 565 - 572
  • [42] Multi-source Multi-domain Sentiment Analysis with BERT-based Models
    Roccabruna, Gabriel
    Azzolin, Steve
    Riccardi, Giuseppe
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 581 - 589
  • [43] Task-Aware BERT-based Sentiment Analysis from Multiple Essences of the Text
    Jia-Hao Hsu
    Chung-Hsien Wu
    Tsung-Hsien Yang
    2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 1982 - 1986
  • [44] Sentiment analysis techniques, challenges, and opportunities: Urdu language-based analytical study
    Liaqat M.I.
    Hassan M.A.
    Shoaib M.
    Khurshid S.K.
    Shamseldin M.A.
    PeerJ Computer Science, 2022, 8
  • [45] Sentiment analysis techniques, challenges, and opportunities: Urdu language-based analytical study
    Liaqat, Muhammad Irzam
    Hassan, Muhammad Awais
    Shoaib, Muhammad
    Khurshid, Syed Khaldoon
    Shamseldin, Mohamed A.
    PEERJ COMPUTER SCIENCE, 2022, 8
  • [46] Enhancement of Named Entity Recognition in Low-Resource Languages with Data Augmentation and BERT Models: A Case Study on Urdu
    Ullah, Fida
    Gelbukh, Alexander
    Zamir, Muhammad Tayyab
    Riveron, Edgardo Manuel Felipe
    Sidorov, Grigori
    COMPUTERS, 2024, 13 (10)
  • [47] Bert-Based Latent Semantic Analysis (Bert-LSA): A Case Study on Geospatial Data Technology and Application Trend Analysis
    Cheng, Quanying
    Zhu, Yunqiang
    Song, Jia
    Zeng, Hongyun
    Wang, Shu
    Sun, Kai
    Zhang, Jinqu
    APPLIED SCIENCES-BASEL, 2021, 11 (24):
  • [48] Word Sense Disambiguation for Morphologically Rich Low-Resourced Languages: A Systematic Literature Review and Meta-Analysis
    Masethe, Hlaudi Daniel
    Masethe, Mosima Anna
    Ojo, Sunday Olusegun
    Giunchiglia, Fausto
    Owolawi, Pius Adewale
    INFORMATION, 2024, 15 (09)
  • [49] A BERT-Based Aspect-Level Sentiment Analysis Algorithm for Cross-Domain Text
    Liu, Ning
    Zhao, Jianhua
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
  • [50] Enhancing Sentiment Analysis for Chinese Texts Using a BERT-Based Model with a Custom Attention Mechanism
    Ding, Linlin
    Han, Yiming
    Li, Mo
    Li, Dong
    WEB INFORMATION SYSTEMS AND APPLICATIONS, WISA 2024, 2024, 14883 : 172 - 179