Label modification and bootstrapping for zero-shot cross-lingual hate speech detection

被引:6
|
作者
Bigoulaeva, Irina [1 ]
Hangya, Viktor [2 ]
Gurevych, Iryna [1 ]
Fraser, Alexander [2 ]
机构
[1] Tech Univ Darmstadt, Dept Comp Sci, Ubiquitous Knowledge Proc Lab, UKP Lab, Darmstadt, Germany
[2] Ludwig Maximilians Univ Munchen, Ctr Informat & Language Proc, Munich, Germany
关键词
Hate speech; Cross-lingual transfer learning; Class imbalance; BERT; CNN; LSTM;
D O I
10.1007/s10579-023-09637-4
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The goal of hate speech detection is to filter negative online content aiming at certain groups of people. Due to the easy accessibility and multilinguality of social media platforms, it is crucial to protect everyone which requires building hate speech detection systems for a wide range of languages. However, the available labeled hate speech datasets are limited, making it difficult to build systems for many languages. In this paper we focus on cross-lingual transfer learning to support hate speech detection in low-resource languages, while highlighting label issues across application scenarios, such as inconsistent label sets of corpora or differing hate speech definitions, which hinder the application of such methods. We leverage cross-lingual word embeddings to train our neural network systems on the source language and apply them to the target language, which lacks labeled examples, and show that good performance can be achieved. We then incorporate unlabeled target language data for further model improvements by bootstrapping labels using an ensemble of different model architectures. Furthermore, we investigate the issue of label imbalance in hate speech datasets, since the high ratio of non-hate examples compared to hate examples often leads to low model performance. We test simple data undersampling and oversampling techniques and show their effectiveness.
引用
收藏
页码:1515 / 1546
页数:32
相关论文
共 50 条
  • [1] Label modification and bootstrapping for zero-shot cross-lingual hate speech detection
    Irina Bigoulaeva
    Viktor Hangya
    Iryna Gurevych
    Alexander Fraser
    [J]. Language Resources and Evaluation, 2023, 57 : 1515 - 1546
  • [2] Exposing the limits of Zero-shot Cross-lingual Hate Speech Detection
    Nozza, Debora
    [J]. ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 2, 2021, : 907 - 914
  • [3] A joint learning approach with knowledge injection for zero-shot cross-lingual hate speech detection
    Pamungkas, Endang Wahyu
    Basile, Valerio
    Patti, Viviana
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2021, 58 (04)
  • [4] Zero-Shot Cross-lingual Aphasia Detection using Automatic Speech Recognition
    Chatzoudis, Gerasimos
    Plitsis, Manos
    Stamouli, Spyridoula
    Dimou, Athanasia-Lida
    Katsamanis, Nassos
    Katsouros, Vassilis
    [J]. INTERSPEECH 2022, 2022, : 2178 - 2182
  • [5] Zero-Shot Cross-lingual Semantic Parsing
    Sherborne, Tom
    Lapata, Mirella
    [J]. PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 4134 - 4153
  • [6] Rumour Detection via Zero-Shot Cross-Lingual Transfer Learning
    Tian, Lin
    Zhang, Xiuzhen
    Lau, Jey Han
    [J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, 2021, 12975 : 603 - 618
  • [7] Zero-Shot Cross-Lingual Opinion Target Extraction
    Jebbara, Soufian
    Cimiano, Philipp
    [J]. 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 2486 - 2495
  • [8] Zero-Shot Cross-Lingual Neural Headline Generation
    Ayana
    Shen, Shi-qi
    Chen, Yun
    Yang, Cheng
    Liu, Zhi-yuan
    Sun, Mao-song
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (12) : 2319 - 2327
  • [9] XeroAlign: Zero-Shot Cross-lingual Transformer Alignment
    Gritta, Milan
    Iacobacci, Ignacio
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 371 - 381
  • [10] Zero-Shot Cross-Lingual Transfer with Meta Learning
    Nooralahzadeh, Farhad
    Bekoulis, Giannis
    Bjerva, Johannes
    Augenstein, Isabelle
    [J]. PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 4547 - 4562