BERT-Based Sentiment Analysis for Low-Resourced Languages: A Case Study of Urdu Language

被引:2
|
作者
Ashraf, Muhammad Rehan [1 ,2 ]
Jana, Yasmeen [2 ]
Umer, Qasim [2 ,3 ]
Jaffar, M. Arfan [1 ]
Chung, Sungwook [4 ]
Ramay, Waheed Yousuf [5 ]
机构
[1] Super Univ, Dept Comp Sci, Lahore 54000, Pakistan
[2] COMSATS Univ Islamabad, Dept Comp Sci, Vehari 61000, Pakistan
[3] Hanyang Univ, Dept Comp Sci, Seoul 04763, South Korea
[4] Changwon Natl Univ, Dept Comp Engn, Chang Won 51140, South Korea
[5] Air Univ, Dept Comp Sci, Multan 60000, Pakistan
来源
IEEE ACCESS | 2023年 / 11卷
关键词
Sentiment analysis; Support vector machines; Social networking (online); Sports; Blogs; Encoding; Natural language processing; Linguistics; Urdu; BERT; classification; sentiment analysis; ROMAN URDU; CLASSIFICATION; MACHINE;
D O I
10.1109/ACCESS.2023.3322101
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Sentiment analysis holds significant importance in research projects by providing valuable insights into public opinions. However, the majority of sentiment analysis studies focus on the English language, leaving a gap in research for other low-resourced languages or regional languages, e.g., Persian, Pashto, and Urdu. Moreover, computational linguists face the challenge of developing lexical resources for these languages. In light of this, this paper presents a deep learning-based approach for Urdu Text Sentiment Analysis (USA-BERT), leveraging Bidirectional Encoder Representations from Transformers and introduces an Urdu Dataset for Sentiment Analysis-23 (UDSA-23). USA-BERT first preprocesses the Urdu reviews by exploiting BERT-Tokenizer. Second, it creates BERT embeddings for each Urdu review. Third, given the BERT embeddings, it fine-tunes a deep learning classifier (BERT). Finally, it employs the Pareto principle on two datasets (the state-of-the-art (UCSA-21) and UDSA-23) to assess USA-BERT. The assessment results demonstrate that USA-BERT significantly surpasses the existing methods by improving the accuracy and f-measure up to 26.09% and 25.87%, respectively.
引用
收藏
页码:110245 / 110259
页数:15
相关论文
共 50 条
  • [1] An Effective BERT-Based Pipeline for Twitter Sentiment Analysis: A Case Study in Italian
    Pota, Marco
    Ventura, Mirko
    Catelli, Rosario
    Esposito, Massimo
    [J]. SENSORS, 2021, 21 (01) : 1 - 21
  • [2] BERT-Based Stock Market Sentiment Analysis
    Lee, Chien-Cheng
    Gao, Zhongjian
    Tsai, Chun-Li
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS - TAIWAN (ICCE-TAIWAN), 2020,
  • [3] Transformer-based Machine Translation for Low-resourced Languages embedded with Language Identification
    Sefara, Tshephisho J.
    Zwane, Skhumbuzo G.
    Gama, Nelisiwe
    Sibisi, Hlawulani
    Senoamadi, Phillemon N.
    Marivate, Vukosi
    [J]. 2021 CONFERENCE ON INFORMATION COMMUNICATIONS TECHNOLOGY AND SOCIETY (ICTAS), 2021, : 127 - 132
  • [4] BERT-based Conformal Predictor for Sentiment Analysis
    Maltoudoglou, Lysimachos
    Paisios, Andreas
    Papadopoulos, Harris
    [J]. CONFORMAL AND PROBABILISTIC PREDICTION AND APPLICATIONS, VOL 128, 2020, 128 : 269 - 284
  • [5] Participatory Research for Low-resourced Machine Translation: A Case Study in African Languages
    Nekoto, Wilhelmina
    Marivate, Vukosi
    Matsila, Tshinondiwa
    Fasubaa, Timi
    Kolawole, Tajudeen
    Fagbohungbe, Taiwo
    Akinola, Solomon Oluwole
    Muhammad, Shamsuddee Hassan
    Kabongo, Salomon
    Osei, Salomey
    Freshia, Sackey
    Niyongabo, Rubungo Andre
    Macharm, Ricky
    Ogayo, Perez
    Ahia, Orevaoghene
    Meressa, Musie
    Adeyemi, Mofe
    Mokgesi-Selinga, Masabata
    Okegbemi, Lawrence
    Martinus, Laura Jane
    Tajudeen, Kolawole
    Degila, Kevin
    Ogueji, Kelechi
    Siminyu, Kathleen
    Kreutzer, Julia
    Webster, Jason
    Ali, Jamiil Toure
    Abbott, Jade
    Orife, Iroro
    Ezeani, Ignatius
    Dangana, Idris Abdulkabir
    Kamper, Herman
    Elsahar, Hady
    Duru, Goodness
    Kioko, Ghollah
    Murhabazi, Espoir
    van Biljon, Elan
    Whitenack, Daniel
    Onyefuluchi, Christopher
    Emezue, Chris
    Dossou, Bonaventure
    Sibanda, Blessing
    Bassey, Blessing Itoro
    Olabiyi, Ayodele
    Ramkilowan, Arshath
    Oktem, Alp
    Akinfaderin, Adewale
    Bashir, Abdallah
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 2144 - 2160
  • [6] ASR DOMAIN ADAPTATION METHODS FOR LOW-RESOURCED LANGUAGES: APPLICATION TO ROMANIAN LANGUAGE
    Cucu, Horia
    Besacier, Laurent
    Burileanu, Corneliu
    Buzo, Andi
    [J]. 2012 PROCEEDINGS OF THE 20TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2012, : 1648 - 1652
  • [7] Bert-based graph unlinked embedding for sentiment analysis
    Jin, Youkai
    Zhao, Anping
    [J]. COMPLEX & INTELLIGENT SYSTEMS, 2024, 10 (02) : 2627 - 2638
  • [8] BERT-Based Sentiment Analysis: A Software Engineering Perspective
    Batra, Himanshu
    Punn, Narinder Singh
    Sonbhadra, Sanjay Kumar
    Agarwal, Sonali
    [J]. DATABASE AND EXPERT SYSTEMS APPLICATIONS, DEXA 2021, PT I, 2021, 12923 : 138 - 148
  • [9] Bert-based graph unlinked embedding for sentiment analysis
    Youkai Jin
    Anping Zhao
    [J]. Complex & Intelligent Systems, 2024, 10 : 2627 - 2638
  • [10] GeezSwitch: Language Identification in Typologically Related Low-resourced East African Languages
    Gaim, Fitsum
    Yang, Wonsuk
    Park, Jong C.
    [J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6578 - 6584