A Large Language Model Approach to Detect Hate Speech in Political Discourse Using Multiple Language Corpora

被引:1
|
作者
de Oliveira, Aillkeen Bezerra [1 ]
Baptista, Claudio de Souza [1 ]
Firmino, Anderson Almeida [1 ]
de Paiva, Anselmo Cardoso [2 ]
机构
[1] Univ Fed Campina Grande, Campina Grande, Paraiba, Brazil
[2] Univ Fed Maranhao, Sao Luis, Maranhao, Brazil
关键词
Hate Speech; Large Language Model; Cross-Lingual Learning; Machine Learning; Natural Language Processing;
D O I
10.1145/3605098.3635964
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In this era of unprecedented digital connectivity and interactions, the issue of hate speech has become a focal point in societal discussions. The rise of digital communication platforms has fundamentally transformed how hate speech spreads. Online social media and messaging apps have rapidly disseminated hate speech, exacerbated by the internet's anonymity. Computational technology has emerged as a valuable tool for identifying and mitigating hate speech on social media. In this work, we employed five distinct corpora representing the English, Italian, Filipino, German, and Turkish languages. We propose employing a Large Language Model (GPT-3) enhanced with Cross-Lingual Learning to improve hate speech detection in English and Italian. Our investigation employs a strategy, namely JL/CL+, which combines two strategies: Joint Learning (JL) and Cascade Learning (CL). Even using data with lexical disparities, our findings demonstrate substantial success, yielding an F1-score of 96.58% for English and 92.05% for Italian languages.
引用
收藏
页码:1461 / 1468
页数:8
相关论文
共 50 条
  • [1] A language model adaptation using multiple varied corpora
    Yamamoto, H
    Sagisaka, Y
    [J]. ASRU 2001: IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, CONFERENCE PROCEEDINGS, 2001, : 389 - 392
  • [2] Implementation of Machine Learning to Detect Hate Speech in Bangla Language
    Ahammed, Shovon
    Rahman, Mostafizur
    Niloy, Mahedi Hasan
    Chowdhury, S. M. Mazharul Hoque
    [J]. PROCEEDINGS OF THE 2019 8TH INTERNATIONAL CONFERENCE ON SYSTEM MODELING & ADVANCEMENT IN RESEARCH TRENDS (SMART-2019), 2019, : 317 - 320
  • [3] Emojis as anchors to detect Arabic offensive language and hate speech
    Mubarak, Hamdy
    Hassan, Sabit
    Chowdhury, Shammur Absar
    [J]. NATURAL LANGUAGE ENGINEERING, 2023, 29 (06) : 1436 - 1457
  • [4] Using Corpora to Learn about Language and Discourse
    Masso, Isamar Coromoto Carrillo
    [J]. DISCOURSE STUDIES, 2011, 13 (02) : 271 - 272
  • [5] Annotating discourse markers in spontaneous speech corpora on an example for the Slovenian language
    Darinka Verdonik
    Matej Rojc
    Marko Stabej
    [J]. Language Resources and Evaluation, 2007, 41 : 147 - 180
  • [6] Annotating discourse markers in spontaneous speech corpora on an example for the Slovenian language
    Verdonik, Darinka
    Rojc, Matej
    Stabej, Marko
    [J]. LANGUAGE RESOURCES AND EVALUATION, 2007, 41 (02) : 147 - 180
  • [7] Balancing word lists in speech audiometry through large spoken language corpora
    Hammer, Annemiek
    Vaerenberg, Bart
    Kowalczyk, Wojtek
    ten Bosch, Louis
    Coene, Martine
    Govaerts, Paul
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 3580 - 3583
  • [8] UHated: hate speech detection in Urdu language using transfer learning
    Muhammad Umair Arshad
    Raza Ali
    Mirza Omer Beg
    Waseem Shahzad
    [J]. Language Resources and Evaluation, 2023, 57 : 713 - 732
  • [9] UHated: hate speech detection in Urdu language using transfer learning
    Arshad, Muhammad Umair
    Ali, Raza
    Beg, Mirza Omer
    Shahzad, Waseem
    [J]. LANGUAGE RESOURCES AND EVALUATION, 2023, 57 (02) : 713 - 732
  • [10] Comparing Pre-Trained Language Model for Arabic Hate Speech Detection
    Daouadi, Kheir Eddine
    Boualleg, Yaakoub
    Guehairia, Oussama
    [J]. COMPUTACION Y SISTEMAS, 2024, 28 (02): : 681 - 693