Label modification and bootstrapping for zero-shot cross-lingual hate speech detection

被引：7

作者：

Bigoulaeva, Irina ^{[1
]}

Hangya, Viktor ^{[2
]}

Gurevych, Iryna ^{[1
]}

Fraser, Alexander ^{[2
]}

机构：

[1] Tech Univ Darmstadt, Dept Comp Sci, Ubiquitous Knowledge Proc Lab, UKP Lab, Darmstadt, Germany

[2] Ludwig Maximilians Univ Munchen, Ctr Informat & Language Proc, Munich, Germany

来源：

LANGUAGE RESOURCES AND EVALUATION | 2023年 / 57卷 / 04期

关键词：

Hate speech; Cross-lingual transfer learning; Class imbalance; BERT; CNN; LSTM;

D O I：

10.1007/s10579-023-09637-4

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

The goal of hate speech detection is to filter negative online content aiming at certain groups of people. Due to the easy accessibility and multilinguality of social media platforms, it is crucial to protect everyone which requires building hate speech detection systems for a wide range of languages. However, the available labeled hate speech datasets are limited, making it difficult to build systems for many languages. In this paper we focus on cross-lingual transfer learning to support hate speech detection in low-resource languages, while highlighting label issues across application scenarios, such as inconsistent label sets of corpora or differing hate speech definitions, which hinder the application of such methods. We leverage cross-lingual word embeddings to train our neural network systems on the source language and apply them to the target language, which lacks labeled examples, and show that good performance can be achieved. We then incorporate unlabeled target language data for further model improvements by bootstrapping labels using an ensemble of different model architectures. Furthermore, we investigate the issue of label imbalance in hate speech datasets, since the high ratio of non-hate examples compared to hate examples often leads to low model performance. We test simple data undersampling and oversampling techniques and show their effectiveness.

引用

页码：1515 / 1546

页数：32

共 50 条

[41] Improving Zero-Shot Cross-Lingual Transfer Learning via Robust Training
Huang, Kuan-Hao
Ahmad, Wasi Uddin
Peng, Nanyun
Chang, Kai-Wei
2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 1684 - 1697
[42] Why Does Zero-Shot Cross-Lingual Generation Fail? An Explanation and a Solution
Li, Tianjian
Murray, Kenton
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 12461 - 12476
[43] Beyond the EnglishWeb: Zero-Shot Cross-Lingual and Lightweight Monolingual Classification of Registers
Repo, Liina
Skantsi, Valtteri
Ronnqvist, Samuel
Hellstrom, Saara
Oinonen, Miika
Salmela, Anna
Biber, Douglas
Egbert, Jesse
Pyysalo, Sampo
Laippala, Veronika
EACL 2021: THE 16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: PROCEEDINGS OF THE STUDENT RESEARCH WORKSHOP, 2021, : 183 - 191
[44] Adversarial Propagation and Zero-Shot Cross-Lingual Transfer of Word Vector Specialization
Ponti, Edoardo M.
Vulic, Ivan
Glavas, Goran
Mrksic, Nikola
Korhonen, Anna
2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 282 - 293
[45] The Impact of Cross-Lingual Adjustment of Contextual Word Representations on Zero-Shot Transfer
Efimov, Pavel
Boytsov, Leonid
Arslanova, Elena
Braslavski, Pavel
ADVANCES IN INFORMATION RETRIEVAL, ECIR 2023, PT III, 2023, 13982 : 51 - 67
[46] Feature Aggregation in Zero-Shot Cross-Lingual Transfer Using Multilingual BERT
Chen, Beiduo
Guo, Wu
Liu, Quan
Tao, Kun
2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 1428 - 1435
[47] Zero-Shot Cross-Lingual Knowledge Transfer in VQA via Multimodal Distillation
Weng, Yu
Dong, Jun
He, Wenbin
Chaomurilige
Liu, Xuan
Liu, Zheng
Gao, Honghao
IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2024, : 1 - 11
[48] Improving Cross-lingual Text Classification with Zero-shot Instance-Weighting
Li, Irene
Sen, Prithviraj
Zhu, Huaiyu
Li, Yunyao
Radev, Dragomir
REPL4NLP 2021: PROCEEDINGS OF THE 6TH WORKSHOP ON REPRESENTATION LEARNING FOR NLP, 2021, : 1 - 7
[49] Cross-lingual Capsule Network for Hate Speech Detection in Social Media
Jiang, Aiqi
Zubiaga, Arkaitz
PROCEEDINGS OF THE 32ND ACM CONFERENCE ON HYPERTEXT AND SOCIAL MEDIA (HT '21), 2021, : 217 - 223
[50] Multilingual Generative Language Models for Zero-Shot Cross-Lingual Event Argument Extraction
Huang, Kuan-Hao
Hsu, I-Hung
Natarajan, Premkumar
Chang, Kai-Wei
Peng, Nanyun
PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 4633 - 4646

← 1 2 3 4 5 →