A joint learning approach with knowledge injection for zero-shot cross-lingual hate speech detection

被引:43
|
作者
Pamungkas, Endang Wahyu [1 ]
Basile, Valerio [1 ]
Patti, Viviana [1 ]
机构
[1] Univ Turin, Dept Comp Sci, Turin, Italy
关键词
Hate speech detection; Cross-lingual classification; Social media; Transfer learning; Zero-shot learning; ETHNOPHAULISMS;
D O I
10.1016/j.ipm.2021.102544
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Hate speech is an increasingly important societal issue in the era of digital communication. Hateful expressions often make use of figurative language and, although they represent, in some sense, the dark side of language, they are also often prime examples of creative use of language. While hate speech is a global phenomenon, current studies on automatic hate speech detection are typically framed in a monolingual setting. In this work, we explore hate speech detection in low-resource languages by transferring knowledge from a resource-rich language, English, in a zero-shot learning fashion. We experiment with traditional and recent neural architectures, and propose two joint-learning models, using different multilingual language representations to transfer knowledge between pairs of languages. We also evaluate the impact of additional knowledge in our experiment, by incorporating information from a multilingual lexicon of abusive words. The results show that our joint-learning models achieve the best performance on most languages. However, a simple approach that uses machine translation and a pre-trained English language model achieves a robust performance. In contrast, Multilingual BERT fails to obtain a good performance in cross-lingual hate speech detection. We also experimentally found that the external knowledge from a multilingual abusive lexicon is able to improve the models' performance, specifically in detecting the positive class. The results of our experimental evaluation highlight a number of challenges and issues in this particular task. One of the main challenges is related to the issue of current benchmarks for hate speech detection, in particular how bias related to the topical focus in the datasets influences the classification performance. The insufficient ability of current multilingual language models to transfer knowledge between languages in the specific hate speech detection task also remain an open problem. However, our experimental evaluation and our qualitative analysis show how the explicit integration of linguistic knowledge from a structured abusive language lexicon helps to alleviate this issue.
引用
收藏
页数:19
相关论文
共 50 条
  • [1] Exposing the limits of Zero-shot Cross-lingual Hate Speech Detection
    Nozza, Debora
    ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 2, 2021, : 907 - 914
  • [2] Label modification and bootstrapping for zero-shot cross-lingual hate speech detection
    Bigoulaeva, Irina
    Hangya, Viktor
    Gurevych, Iryna
    Fraser, Alexander
    LANGUAGE RESOURCES AND EVALUATION, 2023, 57 (04) : 1515 - 1546
  • [3] Label modification and bootstrapping for zero-shot cross-lingual hate speech detection
    Irina Bigoulaeva
    Viktor Hangya
    Iryna Gurevych
    Alexander Fraser
    Language Resources and Evaluation, 2023, 57 : 1515 - 1546
  • [4] Rumour Detection via Zero-Shot Cross-Lingual Transfer Learning
    Tian, Lin
    Zhang, Xiuzhen
    Lau, Jey Han
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, 2021, 12975 : 603 - 618
  • [5] Zero-Shot Cross-lingual Aphasia Detection using Automatic Speech Recognition
    Chatzoudis, Gerasimos
    Plitsis, Manos
    Stamouli, Spyridoula
    Dimou, Athanasia-Lida
    Katsamanis, Nassos
    Katsouros, Vassilis
    INTERSPEECH 2022, 2022, : 2178 - 2182
  • [6] Zero-Shot Cross-Lingual Transfer with Meta Learning
    Nooralahzadeh, Farhad
    Bekoulis, Giannis
    Bjerva, Johannes
    Augenstein, Isabelle
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 4547 - 4562
  • [7] Cross-lingual Contextualized Topic Models with Zero-shot Learning
    Bianchi, Federico
    Terragni, Silvia
    Hovy, Dirk
    Nozza, Debora
    Fersini, Elisabetta
    16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 1676 - 1683
  • [8] Zero-Shot Learning for Cross-Lingual News Sentiment Classification
    Pelicon, Andraz
    Pranjic, Marko
    Miljkovic, Dragana
    Skrlj, Blaz
    Pollak, Senja
    APPLIED SCIENCES-BASEL, 2020, 10 (17):
  • [9] Zero-Shot Cross-lingual Semantic Parsing
    Sherborne, Tom
    Lapata, Mirella
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 4134 - 4153
  • [10] Zero-Shot Text Normalization via Cross-Lingual Knowledge Distillation
    Wang, Linqin
    Huang, Xiang
    Yu, Zhengtao
    Peng, Hao
    Gao, Shengxiang
    Mao, Cunli
    Huang, Yuxin
    Dong, Ling
    Yu, Philip S.
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 4631 - 4646