Generalizing Hate Speech Detection Using Multi-Task Learning: A Case Study of Political Public Figures

被引:2
|
作者
Yuan, Lanqin [1 ]
Rizoiu, Marian-Andrei [1 ]
机构
[1] Univ Technol Sydney, 15 Broadway, Sydney, NSW 2007, Australia
来源
关键词
Hate speech; Abusive speech; Multi-task learning; Public political figures; Transfer learning;
D O I
10.1016/j.csl.2024.101690
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatic identification of hateful and abusive content is vital in combating the spread of harmful online content and its damaging effects. Most existing works evaluate models by examining the generalization error on train-test splits on hate speech datasets. These datasets often differ in their definitions and labeling criteria, leading to poor generalization performance when predicting across new domains and datasets. This work proposes a new Multi-task Learning (MTL) pipeline that trains simultaneously across multiple hate speech datasets to construct a more encompassing classification model. Using a dataset-level leave- one-out evaluation (designating a dataset for testing and jointly training on all others), we trial the MTL detection on new, previously unseen datasets. Our results consistently outperform a large sample of existing work. We show strong results when examining the generalization error in train-test splits and substantial improvements when predicting on previously unseen datasets. Furthermore, we assemble a novel dataset, dubbed PUBFIGS, , focusing on the problematic speech of American Public Political Figures. We crowdsource-label using Amazon MTurk more than 20,000 tweets and machine-label problematic speech in all the 305,235 tweets in PUBFIGS. . We find that the abusive and hate tweeting mainly originates from right-leaning figures and relates to six topics, including Islam, women, ethnicity, and immigrants. We show that MTL builds embeddings that can simultaneously separate abusive from hate speech, and identify its topics.
引用
收藏
页数:20
相关论文
共 50 条
  • [1] HHSD: Hindi Hate Speech Detection Leveraging Multi-Task Learning
    Kapil, Prashant
    Kumari, Gitanjali
    Ekbal, Asif
    Pal, Santanu
    Chatterjee, Arindam
    Vinutha, B. N.
    IEEE ACCESS, 2023, 11 : 101460 - 101473
  • [2] Towards Analyzing the Efficacy of Multi-task Learning in Hate Speech Detection
    Maity, Krishanu
    Balaji, Gokulapriyan
    Saha, Sriparna
    NEURAL INFORMATION PROCESSING, ICONIP 2023, PT VI, 2024, 14452 : 317 - 328
  • [3] A Multi-Task Learning Approach to Hate Speech Detection Leveraging Sentiment Analysis
    Plaza-Del-Arco, Flor Miriam
    Molina-Gonzalez, M. Dolores
    Urena-Lopez, L. Alfonso
    Martin-Valdivia, Maria Teresa
    IEEE ACCESS, 2021, 9 : 112478 - 112489
  • [4] A multi-task learning approach to hate speech detection leveraging sentiment analysis
    Plaza-Del-Arco, Flor Miriam
    Molina-Gonzalez, M. Dolores
    Urena-Lopez, L. Alfonso
    Martin-Valdivia, Maria Teresa
    IEEE Access, 2021, 9 : 112478 - 112489
  • [5] Fuzzy Multi-task Learning for Hate Speech Type Identification
    Liu, Han
    Burnap, Pete
    Alorainy, Wafa
    Williams, Matthew L.
    WEB CONFERENCE 2019: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2019), 2019, : 3006 - 3012
  • [6] Arabic Offensive and Hate Speech Detection Using a Cross-Corpora Multi-Task Learning Model
    Aldjanabi, Wassen
    Dahou, Abdelghani
    Al-qaness, Mohammed A. A.
    Abd Elaziz, Mohamed
    Helmi, Ahmed Mohamed
    Damasevicius, Robertas
    INFORMATICS-BASEL, 2021, 8 (04):
  • [7] A deep neural network based multi-task learning approach to hate speech detection
    Kapil, Prashant
    Ekbal, Asif
    KNOWLEDGE-BASED SYSTEMS, 2020, 210 (210)
  • [8] MULTI-TASK LEARNING IMPROVES SYNTHETIC SPEECH DETECTION
    Mo, Yichuan
    Wang, Shilin
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6392 - 6396
  • [9] Spanish MTLHateCorpus 2023: Multi-task learning for hate speech detection to identify speech type, target, target group and intensity
    Pan, Ronghao
    Garcia-Diaz, Jose Antonio
    Valencia-Garcia, Rafael
    COMPUTER STANDARDS & INTERFACES, 2025, 94
  • [10] Exploring Multi-Task Multi-Lingual Learning of Transformer Models for Hate Speech and Offensive Speech Identification in Social Media
    Mishra S.
    Prasad S.
    Mishra S.
    SN Computer Science, 2021, 2 (2)