Resource Construction and Ensemble Learning Based Sentiment Analysis for the Low-resource Language Uyghur

被引:1
|
作者
Yusup, Azragul [1 ,2 ]
Chen, Degang [1 ]
Ge, Yifei [1 ]
Mao, Hongliang [1 ]
Wang, Nujian [1 ]
机构
[1] Xinjiang Normal Univ, Coll Comp Sci & Technol, Urumqi, Peoples R China
[2] Natl Language Resource Monitoring & Res Ctr Minor, Beijing, Peoples R China
来源
JOURNAL OF INTERNET TECHNOLOGY | 2023年 / 24卷 / 04期
关键词
Low-resource language; Uyghur; HTL; Stacking ensemble learning; Sentiment analysis;
D O I
10.53106/160792642023072404018
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
To address the problem of scarce low-resource sentiment analysis corpus nowadays, this paper proposes a sentence level sentiment analysis resource conversion method HTL based on the syntactic-semantic knowledge of the low resource language Uyghur to convert high-resource corpus to low-resource corpus. In the conversion process, a k-fold cross-filtering method is proposed to reduce the distortion of data samples, which is used to select high-quality samples for conversion; finally, the Uyghur sentiment analysis dataset USD is constructed; the Baseline of this dataset is verified under the LSTM model, and the accuracy and F1 values reach 81.07% and 81.13%, respectively, which can provide a reference for the construction of low-resource language corpus nowadays. The accuracy and F1 values reached 81.07% and 81.13%, respectively, which can provide a reference for the construction of today's low-resource corpus. Meanwhile, this paper also proposes a sentiment analysis model based on logistic regression ensemble learning, SALREL, which combines the advantages of several lightweight network models such as TextCNN, RNN, and RCNN as the base model, and the meta-model is constructed using logistic regression functions for ensemble, and the accuracy and F1 values reach 82.17% and 81.86% respectively in the test set, and the experimental results show that the method can effectively improve the performance of Uyghur sentiment analysis task.
引用
收藏
页码:1009 / 1016
页数:8
相关论文
共 50 条
  • [1] Sentiment Analysis of Low-Resource Language Literature Using Data Processing and Deep Learning
    Ali, Aizaz
    Khan, Maqbool
    Khan, Khalil
    Khan, Rehan Ullah
    Aloraini, Abdulrahman
    [J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 79 (01): : 713 - 733
  • [2] Low-Resource Aspect-Based Sentiment Analysis: A Survey
    Chen Z.
    Qian T.-Y.
    Li W.-L.
    Zhang T.
    Zhou S.
    Zhong M.
    Zhu Y.-Y.
    Liu M.-C.
    [J]. Jisuanji Xuebao/Chinese Journal of Computers, 2023, 46 (07): : 1445 - 1472
  • [3] Deep Ensemble Network for Sentiment Analysis in Bi-lingual Low-resource Languages
    Roy, Pradeep Kumar
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (01)
  • [4] Lexicon-based fine-tuning of multilingual language models for low-resource language sentiment analysis
    Dhananjaya, Vinura
    Ranathunga, Surangika
    Jayasena, Sanath
    [J]. CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2024,
  • [5] Research on Acoustic Modeling of Low Resource Uyghur Language Based on Transfer Learning
    Wang, Tengjun
    Zhang, Fan
    Dai, Yugang
    Xu, Tao
    [J]. PROCEEDINGS OF 2021 2ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND INFORMATION SYSTEMS (ICAIIS '21), 2021,
  • [6] Sentiment analysis on a low-resource language dataset using multimodal representation learning and cross-lingual transfer learning
    Gladys, A. Aruna
    Vetriselvi, V.
    [J]. APPLIED SOFT COMPUTING, 2024, 157
  • [7] Construction and Evaluation of Sentiment Datasets for Low-Resource Languages: The Case of Uzbek
    Kuriyozov, Elmurod
    Matlatipov, Sanatbek
    Alonso, Miguel A.
    Gomez-Rodriguez, Carlos
    [J]. HUMAN LANGUAGE TECHNOLOGY: CHALLENGES FOR COMPUTER SCIENCE AND LINGUISTICS, LTC 2019, 2022, 13212 : 232 - 243
  • [8] Learning Bilingual Lexicon for Low-Resource Language Pairs
    Zhu, ShaoLin
    Li, Xiao
    Yang, YaTing
    Wang, Lei
    Mi, ChengGang
    [J]. NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2017, 2018, 10619 : 760 - 770
  • [9] Building lexicon-based sentiment analysis model for low-resource languages
    Mohammed, Idi
    Prasad, Rajesh
    [J]. METHODSX, 2023, 11
  • [10] Autoregressive Feature Extraction with Topic Modeling for Aspect-based Sentiment Analysis of Arabic as a Low-resource Language
    Sweidan, Asmaa Hashem
    El-Bendary, Nashwa
    Elhariri, Esraa
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (02)