Understanding hate speech: the HateInsights dataset and model interpretability

被引:0
|
作者
Arshad, Muhammad Umair [1 ]
Shahzad, Waseem [1 ]
机构
[1] Natl Univ Comp & Emerging Sci, Dept Artificial Intelligence & Data Sci, Islamabad, Pakistan
关键词
Explainable AI; Hate speech; LLM; AI; Machine learning; Natural language processing; LANGUAGE;
D O I
10.7717/peerj-cs.2372
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The persistence of hate speech continues to pose an obstacle in the realm of online social media. Despite the continuous evolution of advanced models for identifying hate speech, the critical dimensions of interpretability and explainability have not received proportional scholarly attention. In this article, we introduce the HateInsights dataset, a groundbreaking benchmark in the fi eld of hate speech datasets, encompassing diverse aspects of this widespread issue. Within our dataset, each individual post undergoes thorough annotation from dual perspectives: fi rstly, conforming to the established 3-class classification fi cation paradigm that includes hate speech, offensive language, and normal discourse; secondly, incorporating rationales that outline specific fi c segments of a post supporting the assigned label (categorized as hate speech, offensive language, or normal discourse). Our exploration yields a significant fi cant fi nding by harnessing cutting-edge state-of-the-art models: even models demonstrating exceptional proficiency fi ciency in classification fi cation tasks yield suboptimal outcomes in crucial explainability metrics, such as model plausibility and faithfulness. Furthermore, our analysis underscores a promising revelation concerning models trained using human-annotated rationales. To facilitate scholarly progress in this realm, we have made both our dataset and codebase accessible to fellow researchers. This initiative aims to encourage collaborative involvement and inspire the advancement of the hate speech detection approach characterized by increased transparency, clarity, and fairness.
引用
收藏
页数:22
相关论文
共 50 条
  • [1] Understanding hate speech: the HateInsights dataset and model interpretability
    Arshad, Muhammad Umair
    Shahzad, Waseem
    PeerJ Computer Science, 2024, 10
  • [2] Understanding Interpretability: Explainable AI Approaches for Hate Speech Classifiers
    Yadav, Sargam
    Kaushik, Abhishek
    McDaid, Kevin
    EXPLAINABLE ARTIFICIAL INTELLIGENCE, XAI 2023, PT III, 2023, 1903 : 47 - 70
  • [3] Uncovering the Root of Hate Speech: A Dataset for Identifying Hate Instigating Speech
    Park, Hyoungjun
    Shim, Ho Sung
    Lee, Kyuhan
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 6236 - 6245
  • [5] Understanding and appraising hate speech'
    Vilar-Lluch, Sara
    JOURNAL OF LANGUAGE AGGRESSION AND CONFLICT, 2023, 11 (02) : 279 - 306
  • [6] Towards an Intrinsic Interpretability Approach for Multimodal Hate Speech Detection
    Du, Pengfei
    Gao, Yali
    Li, Xiaoyong
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2022, 36 (13)
  • [7] Multiclass hate speech detection with an aggregated dataset
    Walsh, Sinead
    Greaney, Paul
    NATURAL LANGUAGE PROCESSING, 2025,
  • [8] A Turkish Hate Speech Dataset and Detection System
    Beyhan, Fatih
    Carik, Buse
    Arin, Inanc
    Terzioglu, Aysecan
    Yanikoglu, Berrin
    Yeniterzi, Reyyan
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 4177 - 4185
  • [9] Annotation System to Build Cyberbullying and Hate Speech Detection Model Training Dataset
    Febriana, Trisna
    Budiarto, Arif
    CHIUXID 2020: 6TH INTERNATIONAL ACM IN-COOPERATION HCI AND UX CONFERENCE, 2020, : 29 - 30
  • [10] Towards an Organically Growing Hate Speech Dataset in Hate Speech Detection Systems in a Smart Mobility Application
    Alsamman, Ahmad
    Schmitz, Andreas
    Wimmer, Maria A.
    TOGETHER IN THE UNSTABLE WORLD: DIGITAL GOVERNMENT AND SOLIDARITY, 2023, : 36 - 43