Zero-shot Sentiment Analysis in Low-Resource Languages Using a Multilingual Sentiment Lexicon

被引:0
|
作者
Koto, Fajri [1 ]
Beck, Tilman [2 ]
Talat, Zeerak [1 ]
Gurevych, Iryna [1 ]
Baldwin, Timothy [1 ,3 ]
机构
[1] MBZUAI, Dept Nat Language Proc, Abu Dhabi, U Arab Emirates
[2] Tech Univ Darmstadt, Ubiquitous Knowledge Proc Lab, Darmstadt, Germany
[3] Univ Melbourne, Melbourne, Vic, Australia
关键词
NORMS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Improving multilingual language models capabilities in low-resource languages is generally difficult due to the scarcity of large-scale data in those languages. In this paper, we relax the reliance on texts in low-resource languages by using multilingual lexicons in pretraining to enhance multilingual capabilities. Specifically, we focus on zero-shot sentiment analysis tasks across 34 languages, including 6 high/medium-resource languages, 25 low-resource languages, and 3 code-switching datasets. We demonstrate that pretraining using multilingual lexicons, without using any sentence-level sentiment data, achieves superior zero-shot performance compared to models fine-tuned on English sentiment datasets, and large language models like GPT-3.5, BLOOMZ, and XGLM. These findings are observable for unseen low-resource languages to code-mixed scenarios involving high-resource languages.(1)
引用
收藏
页码:298 / 320
页数:23
相关论文
共 50 条
  • [41] Sentiment Analysis of Low-Resource Language Literature Using Data Processing and Deep Learning
    Ali, Aizaz
    Khan, Maqbool
    Khan, Khalil
    Khan, Rehan Ullah
    Aloraini, Abdulrahman
    CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 79 (01): : 713 - 733
  • [42] An Analysis of Massively Multilingual Neural Machine Translation for Low-Resource Languages
    Mueller, Aaron
    Nicolai, Garrett
    McCarthy, Arya D.
    Lewis, Dylan
    Wu, Winston
    Yarowsky, David
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 3710 - 3718
  • [43] HindiMD: A Multi-domain Corpora for Low-resource Sentiment Analysis
    Mamta
    Ekbal, Asif
    Bhattacharyya, Pushpak
    Saha, Tista
    Kumar, Alka
    Srivastava, Shikha
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 7061 - 7070
  • [44] Continual Attention Modeling for Successive Sentiment Analysis in Low-resource Scenarios
    Zhang, Han
    Wang, Jing-Jing
    Luo, Jia-Min
    Zhou, Guo-Dong
    Ruan Jian Xue Bao/Journal of Software, 2024, 35 (12): : 5470 - 5486
  • [45] Unveiling Sentiments: A Deep Dive Into Sentiment Analysis for Low-Resource Languages-A Case Study on Hausa Texts
    Shehu, Harisu Abdullahi
    Majikumna, Kaloma Usman
    Suleiman, Aminu Bashir
    Luka, Stephen
    Sharif, Md. Haidar
    Ramadan, Rabie A.
    Kusetogullari, Huseyin
    IEEE ACCESS, 2024, 12 : 98900 - 98916
  • [46] Enhancing Sentiment Analysis in Amharic: Leveraging Transformer-Based Language Model for Low-Resource African Languages
    Raychawdhary, Nilanjana
    Das, Amit
    Bhattacharya, Sutanu
    Dozier, Gerry
    Seals, Cheryl D.
    SOUTHEASTCON 2024, 2024, : 50 - 55
  • [47] End-to-End Aspect Extraction and Aspect-Based Sentiment Analysis Framework for Low-Resource Languages
    Aivatoglou, Georgios
    Fytili, Alexia
    Arampatzis, Georgios
    Zaikis, Dimitrios
    Stylianou, Nikolaos
    Vlahavas, Ioannis
    INTELLIGENT SYSTEMS AND APPLICATIONS, VOL 3, INTELLISYS 2023, 2024, 824 : 841 - 858
  • [48] Using SentiWordNet for multilingual sentiment analysis
    Denecke, Kerstin
    2008 IEEE 24TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOP, VOLS 1 AND 2, 2008, : 427 - 432
  • [49] Zero-Shot Emotion Detection for Semi-Supervised Sentiment Analysis Using Sentence Transformers and Ensemble Learning
    Tesfagergish, Senait Gebremichael
    Kapociute-Dzikiene, Jurgita
    Damasevicius, Robertas
    APPLIED SCIENCES-BASEL, 2022, 12 (17):
  • [50] Multilingual Offensive Language Identification for Low-resource Languages
    Ranasinghe, Tharindu
    Zampieri, Marcos
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2022, 21 (01)