Understanding hate speech: the HateInsights dataset and model interpretability

被引:0
|
作者
Arshad, Muhammad Umair [1 ]
Shahzad, Waseem [1 ]
机构
[1] Natl Univ Comp & Emerging Sci, Dept Artificial Intelligence & Data Sci, Islamabad, Pakistan
关键词
Explainable AI; Hate speech; LLM; AI; Machine learning; Natural language processing; LANGUAGE;
D O I
10.7717/peerj-cs.2372
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The persistence of hate speech continues to pose an obstacle in the realm of online social media. Despite the continuous evolution of advanced models for identifying hate speech, the critical dimensions of interpretability and explainability have not received proportional scholarly attention. In this article, we introduce the HateInsights dataset, a groundbreaking benchmark in the fi eld of hate speech datasets, encompassing diverse aspects of this widespread issue. Within our dataset, each individual post undergoes thorough annotation from dual perspectives: fi rstly, conforming to the established 3-class classification fi cation paradigm that includes hate speech, offensive language, and normal discourse; secondly, incorporating rationales that outline specific fi c segments of a post supporting the assigned label (categorized as hate speech, offensive language, or normal discourse). Our exploration yields a significant fi cant fi nding by harnessing cutting-edge state-of-the-art models: even models demonstrating exceptional proficiency fi ciency in classification fi cation tasks yield suboptimal outcomes in crucial explainability metrics, such as model plausibility and faithfulness. Furthermore, our analysis underscores a promising revelation concerning models trained using human-annotated rationales. To facilitate scholarly progress in this realm, we have made both our dataset and codebase accessible to fellow researchers. This initiative aims to encourage collaborative involvement and inspire the advancement of the hate speech detection approach characterized by increased transparency, clarity, and fairness.
引用
收藏
页数:22
相关论文
共 50 条
  • [21] ETHOS: a multi-label hate speech detection dataset
    Ioannis Mollas
    Zoe Chrysopoulou
    Stamatis Karlos
    Grigorios Tsoumakas
    Complex & Intelligent Systems, 2022, 8 : 4663 - 4678
  • [22] T-HSAB: A Tunisian Hate Speech and Abusive Dataset
    Haddad, Hatem
    Mulki, Hala
    Oueslati, Asma
    ARABIC LANGUAGE PROCESSING: FROM THEORY TO PRACTICE, ICALP 2019, 2019, 1108 : 251 - 263
  • [24] Terrorist-Extremist Speech and Hate Speech: Understanding the Similarities and Differences
    Katharine Gelber
    Ethical Theory and Moral Practice, 2019, 22 : 607 - 622
  • [25] Enhancing Hate Speech Detection in the Digital Age: A Novel Model Fusion Approach Leveraging a Comprehensive Dataset
    Sharif, Waqas
    Abdullah, Saima
    Iftikhar, Saman
    Al-Madani, Daniah
    Mumtaz, Shahzad
    IEEE ACCESS, 2024, 12 : 27225 - 27236
  • [26] Detection of Hate Speech using BERT and Hate Speech Word Embedding with Deep Model
    Saleh, Hind
    Alhothali, Areej
    Moria, Kawthar
    APPLIED ARTIFICIAL INTELLIGENCE, 2023, 37 (01)
  • [27] Understanding emotions in hate speech: A methodology for discourse analysis
    Alcantara-Pla, Manuel
    DISCOURSE & SOCIETY, 2024, 35 (04) : 417 - 433
  • [28] Latent Hatred: A Benchmark for Understanding Implicit Hate Speech
    ElSherief, Mai
    Ziems, Caleb
    Muchlinski, David
    Anupindi, Vaishnavi
    Seybolt, Jordyn
    De Choudhury, Munmun
    Yang, Diyi
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 345 - 363
  • [29] TABHATE: A Target-based hate speech detection dataset in Hindi
    Sharma, Deepawali
    Singh, Vivek Kumar
    Gupta, Vedika
    SOCIAL NETWORK ANALYSIS AND MINING, 2024, 14 (01)
  • [30] Anatomy of Hate Speech Datasets: Composition Analysis and Cross-dataset Classification
    Guimaraes, Samuel
    Kakizaki, Gabriel
    Melo, Philipe
    Silva, Marcio
    Murai, Fabricio
    Reis, Julio C. S.
    Benevenuto, Fabricio
    34TH ACM CONFERENCE ON HYPERTEXT AND SOCIAL MEDIA, HT 2023, 2023,