Improving hate speech detection using Cross-Lingual Learning

被引:5
|
作者
Firmino, Anderson Almeida [1 ]
Baptista, Claudio de Souza [1 ]
de Paiva, Anselmo Cardoso [2 ]
机构
[1] Univ Fed Campina Grande, Rua Aprigio Veloso 882, Campina Grande, PB, Brazil
[2] Univ Fed Maranhao, Ave Portugueses 1966, Sao Luis, MA, Brazil
关键词
Hate speech detection; Natural language processing; Social media; Cross-Lingual Learning; Deep learning;
D O I
10.1016/j.eswa.2023.121115
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The growth of social media worldwide has brought social benefits and challenges. One problem we highlight is the proliferation of hate speech on social media. We propose a novel method for detecting hate speech in texts using Cross-Lingual Learning. Our approach uses transfer learning from Pre-Trained Language Models (PTLM) with large corpora available to solve problems in languages with fewer resources for the specific task. The proposed methodology comprises four stages: corpora acquisition, the PTLM definition, training strategies, and evaluation. We carried out experiments using Pre-Trained Language Models in English, Italian, and Portuguese (BERT and XLM-R) to verify which best suited the proposed method. We used corpora in English (WH) and Italian (Evalita 2018) as the source language and the OffComBr-2 corpus in Portuguese (the target language). The results of the experiments showed that the proposed methodology is promising: for the OffComBr-2 corpus, the best state-of-the-art result was obtained (F1-measure = 92%).
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Cross-Lingual Few-Shot Hate Speech and Offensive Language Detection Using Meta Learning
    Mozafari, Marzieh
    Farahbakhsh, Reza
    Crespi, Noel
    [J]. IEEE ACCESS, 2022, 10 : 14880 - 14896
  • [2] Cross-lingual Capsule Network for Hate Speech Detection in Social Media
    Jiang, Aiqi
    Zubiaga, Arkaitz
    [J]. PROCEEDINGS OF THE 32ND ACM CONFERENCE ON HYPERTEXT AND SOCIAL MEDIA (HT '21), 2021, : 217 - 223
  • [3] Exposing the limits of Zero-shot Cross-lingual Hate Speech Detection
    Nozza, Debora
    [J]. ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 2, 2021, : 907 - 914
  • [4] A cross-lingual transfer learning method for online COVID-19-related hate speech detection
    Liu, Lin
    Xu, Duo
    Zhao, Pengfei
    Zeng, Daniel Dajun
    Hu, Paul Jen-Hwa
    Zhang, Qingpeng
    Luo, Yin
    Cao, Zhidong
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2023, 234
  • [5] A joint learning approach with knowledge injection for zero-shot cross-lingual hate speech detection
    Pamungkas, Endang Wahyu
    Basile, Valerio
    Patti, Viviana
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2021, 58 (04)
  • [6] Label modification and bootstrapping for zero-shot cross-lingual hate speech detection
    Bigoulaeva, Irina
    Hangya, Viktor
    Gurevych, Iryna
    Fraser, Alexander
    [J]. LANGUAGE RESOURCES AND EVALUATION, 2023, 57 (04) : 1515 - 1546
  • [7] Label modification and bootstrapping for zero-shot cross-lingual hate speech detection
    Irina Bigoulaeva
    Viktor Hangya
    Iryna Gurevych
    Alexander Fraser
    [J]. Language Resources and Evaluation, 2023, 57 : 1515 - 1546
  • [8] Using Cross Lingual Learning for Detecting Hate Speech in Portuguese
    Firmino, Anderson Almeida
    de Baptista, Claudio Souza
    de Paiva, Anselmo Cardoso
    [J]. DATABASE AND EXPERT SYSTEMS APPLICATIONS, DEXA 2021, PT II, 2021, 12924 : 170 - 175
  • [9] Detecting Hate Speech in Cross-Lingual and Multi-lingual Settings Using Language Agnostic Representations
    Rodriguez, Sebastian E.
    Allende-Cid, Hector
    Allende, Hector
    [J]. PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, CIARP 2021, 2021, 12702 : 77 - 87
  • [10] IMPROVING LUXEMBOURGISH SPEECH RECOGNITION WITH CROSS-LINGUAL SPEECH REPRESENTATIONS
    Le Minh Nguyen
    Nayak, Shekhar
    Coler, Matt
    [J]. 2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 792 - 797