Deep Persian sentiment analysis: Cross-lingual training for low-resource languages

被引:24
|
作者
Ghasemi, Rouzbeh [1 ]
Ashrafi Asli, Seyed Arad [1 ]
Momtazi, Saeedeh [1 ]
机构
[1] Amirkabir Univ Technol, Dept Comp Engn, Tehran, Iran
关键词
Convolutional neural network; cross-lingual embedding; long short-term memory network; low-resource languages; sentiment analysis; STRENGTH DETECTION; IMPACT;
D O I
10.1177/0165551520962781
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the advent of deep neural models in natural language processing tasks, having a large amount of training data plays an essential role in achieving accurate models. Creating valid training data, however, is a challenging issue in many low-resource languages. This problem results in a significant difference between the accuracy of available natural language processing tools for low-resource languages compared with rich languages. To address this problem in the sentiment analysis task in the Persian language, we propose a cross-lingual deep learning framework to benefit from available training data of English. We deployed cross-lingual embedding to model sentiment analysis as a transfer learning model which transfers a model from a rich-resource language to low-resource ones. Our model is flexible to use any cross-lingual word embedding model and any deep architecture for text classification. Our experiments on English Amazon dataset and Persian Digikala dataset using two different embedding models and four different classification networks show the superiority of the proposed model compared with the state-of-the-art monolingual techniques. Based on our experiment, the performance of Persian sentiment analysis improves 22% in static embedding and 9% in dynamic embedding. Our proposed model is general and language-independent; that is, it can be used for any low-resource language, once a cross-lingual embedding is available for the source-target language pair. Moreover, by benefitting from word-aligned cross-lingual embedding, the only required data for a reliable cross-lingual embedding is a bilingual dictionary that is available between almost all languages and the English language, as a potential source language.
引用
收藏
页码:449 / 462
页数:14
相关论文
共 50 条
  • [1] Intent detection and slot filling for Persian: Cross-lingual training for low-resource languages
    Zadkamali, Reza
    Momtazi, Saeedeh
    Zeinali, Hossein
    [J]. NATURAL LANGUAGE PROCESSING, 2024,
  • [2] Cross-Lingual Morphological Tagging for Low-Resource Languages
    Buys, Jan
    Botha, Jan A.
    [J]. PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, 2016, : 1954 - 1964
  • [3] Unsupervised Ranked Cross-Lingual Lexical Substitution for Low-Resource Languages
    Ecker, Stefan
    Horbach, Andrea
    Thater, Stefan
    [J]. LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 1709 - 1717
  • [4] Deep Ensemble Network for Sentiment Analysis in Bi-lingual Low-resource Languages
    Roy, Pradeep Kumar
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (01)
  • [5] Adversarial Cross-Lingual Transfer Learning for Slot Tagging of Low-Resource Languages
    He, Keqing
    Yan, Yuanmeng
    Xu, Weiran
    [J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [6] Automatic Wordnet Development for Low-Resource Languages using Cross-Lingual WSD
    Taghizadeh, Nasrin
    Faili, Hesham
    [J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2016, 56 : 61 - 87
  • [7] Cross-lingual offensive speech identification with transfer learning for low-resource languages
    Shi, Xiayang
    Liu, Xinyi
    Xu, Chun
    Huang, Yuanyuan
    Chen, Fang
    Zhu, Shaolin
    [J]. COMPUTERS & ELECTRICAL ENGINEERING, 2022, 101
  • [8] Cross-Lingual Dependency Parsing with Late Decoding for Truly Low-Resource Languages
    Schlichtkrull, Michael Sejr
    Sogaard, Anders
    [J]. 15TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2017), VOL 1: LONG PAPERS, 2017, : 220 - 229
  • [9] Cross-Lingual Sentiment Analysis for Indian Regional Languages
    Impana, P.
    Kallimani, Jagadish S.
    [J]. 2017 INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS, COMMUNICATION, COMPUTER, AND OPTIMIZATION TECHNIQUES (ICEECCOT), 2017, : 867 - 872
  • [10] Cross-Lingual Propagation for Deep Sentiment Analysis
    Dong, Xin
    de Melo, Gerard
    [J]. THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 5771 - 5778