Understanding Interpretability: Explainable AI Approaches for Hate Speech Classifiers

被引:1
|
作者
Yadav, Sargam [1 ]
Kaushik, Abhishek [1 ]
McDaid, Kevin [1 ]
机构
[1] Dundalk Inst Technol, Dundalk, Ireland
关键词
explainable artificial intelligence; hate speech; LIME; SHAP; sentiment analysis; Hinglish; Attention; Transformers; BERT;
D O I
10.1007/978-3-031-44070-0_3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cyberbullying and hate speech are two of the most significant problems in today's cyberspace. Automated artificial intelligence models might be used to find and remove online hate speech, which would address a critical problem. A variety of explainable AI strategies are being developed to make model judgments and justifications intelligible to people as artificial intelligence continues to permeate numerous industries and make critical change. Our study focuses on mixed code languages (a mix of Hindi and English) and the Indian sub-continent. This language combination is extensively used in SARRAC nations. Three transformer-based models and one machine learning model was trained and fine-tuned on the modified HASOC-Identification of Conversational Hate-Speech in Code-Mixed Languages (ICHCL) data for hate speech classification. Several types of explainability techniques have been explored on the respective models, such as Local interpretable model-agnostic explanations (LIME), Shapley additive explanations (SHAP), and model attention, to analyze model behavior. The analysis suggests that better trained models and comparison of Explainable Artificial Intelligence (XAI) techniques would provide better insight.
引用
收藏
页码:47 / 70
页数:24
相关论文
共 50 条
  • [1] Understanding hate speech: the HateInsights dataset and model interpretability
    Arshad, Muhammad Umair
    Shahzad, Waseem
    PEERJ COMPUTER SCIENCE, 2024, 10
  • [2] Understanding hate speech: the HateInsights dataset and model interpretability
    Arshad, Muhammad Umair
    Shahzad, Waseem
    PeerJ Computer Science, 2024, 10
  • [3] Hate Speech Detection in Audio Using SHAP - An Explainable AI
    Imbwaga, Joan L.
    Chittaragi, Nagaratna B.
    Koolagudi, Shashidhar G.
    ADVANCED NETWORK TECHNOLOGIES AND INTELLIGENT COMPUTING, ANTIC 2023, PT II, 2024, 2091 : 289 - 304
  • [4] Decoding Fake News and Hate Speech: A Survey of Explainable AI Techniques
    Ngueajio, Mikel k.
    Aryal, Saurav
    Atemkeng, Marcellin
    Washington, Gloria
    Rawat, Danda
    ACM COMPUTING SURVEYS, 2025, 57 (07)
  • [5] Improving Hate Speech Classification Through Ensemble Learning and Explainable AI Techniques
    Garg, Priya
    Sharma, M. K.
    Kumar, Parteek
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2024,
  • [6] Explainable AI and Causal Understanding: Counterfactual Approaches Considered
    Sam Baron
    Minds and Machines, 2023, 33 : 347 - 377
  • [8] Fine-Grained Multilingual Hate Speech Detection Using Explainable AI and Transformers
    Siddiqui, Jawaid Ahmed
    Yuhaniz, Siti Sophiayati
    Shaikh, Ghulam Mujtaba
    Soomro, Safdar Ali
    Mahar, Zafar Ali
    IEEE ACCESS, 2024, 12 : 143177 - 143192
  • [9] Hate and Aggression Analysis in NLP with Explainable AI
    Raman, Shatakshi
    Gupta, Vedika
    Nagrath, Preeti
    Santosh, K. C.
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2022, 36 (15)
  • [10] A survey of explainable AI techniques for detection of fake news and hate speech on social media platforms
    Gongane, Vaishali U.
    Munot, Mousami V.
    Anuse, Alwin D.
    JOURNAL OF COMPUTATIONAL SOCIAL SCIENCE, 2024, 7 (01): : 587 - 623