A joint learning approach with knowledge injection for zero-shot cross-lingual hate speech detection

被引:43
|
作者
Pamungkas, Endang Wahyu [1 ]
Basile, Valerio [1 ]
Patti, Viviana [1 ]
机构
[1] Univ Turin, Dept Comp Sci, Turin, Italy
关键词
Hate speech detection; Cross-lingual classification; Social media; Transfer learning; Zero-shot learning; ETHNOPHAULISMS;
D O I
10.1016/j.ipm.2021.102544
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Hate speech is an increasingly important societal issue in the era of digital communication. Hateful expressions often make use of figurative language and, although they represent, in some sense, the dark side of language, they are also often prime examples of creative use of language. While hate speech is a global phenomenon, current studies on automatic hate speech detection are typically framed in a monolingual setting. In this work, we explore hate speech detection in low-resource languages by transferring knowledge from a resource-rich language, English, in a zero-shot learning fashion. We experiment with traditional and recent neural architectures, and propose two joint-learning models, using different multilingual language representations to transfer knowledge between pairs of languages. We also evaluate the impact of additional knowledge in our experiment, by incorporating information from a multilingual lexicon of abusive words. The results show that our joint-learning models achieve the best performance on most languages. However, a simple approach that uses machine translation and a pre-trained English language model achieves a robust performance. In contrast, Multilingual BERT fails to obtain a good performance in cross-lingual hate speech detection. We also experimentally found that the external knowledge from a multilingual abusive lexicon is able to improve the models' performance, specifically in detecting the positive class. The results of our experimental evaluation highlight a number of challenges and issues in this particular task. One of the main challenges is related to the issue of current benchmarks for hate speech detection, in particular how bias related to the topical focus in the datasets influences the classification performance. The insufficient ability of current multilingual language models to transfer knowledge between languages in the specific hate speech detection task also remain an open problem. However, our experimental evaluation and our qualitative analysis show how the explicit integration of linguistic knowledge from a structured abusive language lexicon helps to alleviate this issue.
引用
收藏
页数:19
相关论文
共 50 条
  • [41] Zero-shot Cross-lingual Transfer is Under-specified Optimization
    Wu, Shijie
    Van Durme, Benjamin
    Dredze, Mark
    PROCEEDINGS OF THE 7TH WORKSHOP ON REPRESENTATION LEARNING FOR NLP, 2022, : 236 - 248
  • [42] Zero-shot Cross-Lingual Phonetic Recognition with External Language Embedding
    Gao, Heting
    Ni, Junrui
    Zhang, Yang
    Qian, Kaizhi
    Chang, Shiyu
    Hasegawa-Johnson, Mark
    INTERSPEECH 2021, 2021, : 1304 - 1308
  • [43] Realistic Zero-Shot Cross-Lingual Transfer in Legal Topic Classification
    Xenouleas, Stratos
    Tsoukara, Alexia
    Panagiotakis, Giannis
    Chalkidis, Ilias
    Androutsopoulos, Ion
    PROCEEDINGS OF THE 12TH HELLENIC CONFERENCE ON ARTIFICIAL INTELLIGENCE, SETN 2022, 2022,
  • [44] Zero-shot Reading Comprehension by Cross-lingual Transfer Learning with Multi-lingual Language Representation Model
    Hsu, Tsung-Yuan
    Liu, Chi-liang
    Lee, Hung-yi
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 5933 - 5940
  • [45] Cyclical Contrastive Learning Based on Geodesic for Zero-shot Cross-lingual Spoken Language Understanding
    Cheng, Xuxin
    Zhu, Zhihong
    Yang, Bang
    Zhuang, Xianwei
    Li, Hongxiang
    Zou, Yuexian
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 1806 - 1816
  • [46] Multi-level multilingual semantic alignment for zero-shot cross-lingual transfer learning
    Gui, Anchun
    Xiao, Han
    NEURAL NETWORKS, 2024, 173
  • [47] Out of Thin Air: Is Zero-Shot Cross-Lingual Keyword Detection Better Than Unsupervised?
    Koloski, Boshko
    Pollak, Senja
    Skrlj, Blaz
    Martinc, Matej
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 400 - 409
  • [48] Discrepancy and Uncertainty Aware Denoising Knowledge Distillation for Zero-Shot Cross-Lingual Named Entity Recognition
    Ge, Ling
    Hu, Chunming
    Ma, Guanghui
    Liu, Jihong
    Zhang, Hong
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 16, 2024, : 18056 - 18064
  • [49] Zero-Shot Cross-Lingual Transfer in Legal Domain Using Transformer Models
    Shaheen, Zein
    Wohlgenannt, Gerhard
    Mouromtsev, Dmitry
    2021 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE (CSCI 2021), 2021, : 450 - 456
  • [50] Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond
    Artetxe, Mikel
    Schwenk, Holger
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2019, 7 : 597 - 610