Systematic keyword and bias analyses in hate speech detection

被引:1
|
作者
Sarracen, Gretel Liz De la Pella [1 ]
Rosso, Paolo [1 ]
机构
[1] Univ Politecn Valencia, Camino Vera S-N, Valencia 46022, Spain
关键词
Hate speech detection; Keyword extraction; Bias analysis; Bias mitigation;
D O I
10.1016/j.ipm.2023.103433
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Hate speech detection refers broadly to the automatic identification of language that may be considered discriminatory against certain groups of people. The goal is to help online platforms to identify and remove harmful content. Humans are usually capable of detecting hatred in critical cases, such as when the hatred is non-explicit, but how do computer models address this situation? In this work, we aim to contribute to the understanding of ethical issues related to hate speech by analysing two transformer-based models trained to detect hate speech. Our study focuses on analysing the relationship between these models and a set of hateful keywords extracted from the three well-known datasets. For the extraction of the keywords, we propose a metric that takes into account the division among classes to favour the most common words in hateful contexts. In our experiments, we first compared the overlap between the extracted keywords with the words to which the models pay the most attention in decision-making. On the other hand, we investigate the bias of the models towards the extracted keywords. For the bias analysis, we characterize and use two metrics and evaluate two strategies to try to mitigate the bias. Surprisingly, we show that over 50% of the salient words of the models are not hateful and that there is a higher number of hateful words among the extracted keywords. However, we show that the models appear to be biased towards the extracted keywords. Experimental results suggest that fitting models with hateful texts that do not contain any of the keywords can reduce bias and improve the performance of the models.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Bias in Hate Speech and Toxicity Detection
    Lobo, Paula Reyero
    [J]. PROCEEDINGS OF THE 2022 AAAI/ACM CONFERENCE ON AI, ETHICS, AND SOCIETY, AIES 2022, 2022, : 910 - 910
  • [2] The effect of gender bias on hate speech detection
    Furkan Şahinuç
    Eyup Halit Yilmaz
    Cagri Toraman
    Aykut Koç
    [J]. Signal, Image and Video Processing, 2023, 17 : 1591 - 1597
  • [3] The Risk of Racial Bias in Hate Speech Detection
    Sap, Maarten
    Card, Dallas
    Gabriel, Saadia
    Choi, Yejin
    Smith, Noah A.
    [J]. 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 1668 - 1678
  • [4] The effect of gender bias on hate speech detection
    Sahinuc, Furkan
    Yilmaz, Eyup Halit
    Toraman, Cagri
    Koc, Aykut
    [J]. SIGNAL IMAGE AND VIDEO PROCESSING, 2023, 17 (04) : 1591 - 1597
  • [5] Keyword-Enhanced Multi-Expert Framework for Hate Speech Detection
    Zhong, Weiyu
    Wu, Qiaofeng
    Lu, Guojun
    Xue, Yun
    Hu, Xiaohui
    [J]. MATHEMATICS, 2022, 10 (24)
  • [6] Racial Bias in Hate Speech and Abusive Language Detection Datasets
    Davidson, Thomas
    Bhattacharya, Debasmita
    Weber, Ingmar
    [J]. THIRD WORKSHOP ON ABUSIVE LANGUAGE ONLINE, 2019, : 25 - 35
  • [7] Bias Detection and Mitigation in Textual Data: A Study on Fake News and Hate Speech Detection
    Kasampalis, Apostolos
    Chatzakou, Despoina
    Tsikrika, Theodora
    Vrochidis, Stefanos
    Kompatsiaris, Ioannis
    [J]. ADVANCES IN INFORMATION RETRIEVAL, ECIR 2024, PT III, 2024, 14610 : 374 - 383
  • [8] A Multidisciplinary Lens of Bias in Hate Speech
    Lobo, Paula Reyero
    Kwarteng, Joseph
    Russo, Mayra
    Fahimi, Miriam
    Scott, Kristen
    Ferrara, Antonio
    Sen, Indira
    Fernandez, Miriam
    [J]. PROCEEDINGS OF THE 2023 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING, ASONAM 2023, 2023, : 121 - 125
  • [9] Resources and benchmark corpora for hate speech detection: a systematic review
    Poletto, Fabio
    Basile, Valerio
    Sanguinetti, Manuela
    Bosco, Cristina
    Patti, Viviana
    [J]. LANGUAGE RESOURCES AND EVALUATION, 2021, 55 (02) : 477 - 523
  • [10] Resources and benchmark corpora for hate speech detection: a systematic review
    Fabio Poletto
    Valerio Basile
    Manuela Sanguinetti
    Cristina Bosco
    Viviana Patti
    [J]. Language Resources and Evaluation, 2021, 55 : 477 - 523