Understanding Interpretability: Explainable AI Approaches for Hate Speech Classifiers

被引:1
|
作者
Yadav, Sargam [1 ]
Kaushik, Abhishek [1 ]
McDaid, Kevin [1 ]
机构
[1] Dundalk Inst Technol, Dundalk, Ireland
关键词
explainable artificial intelligence; hate speech; LIME; SHAP; sentiment analysis; Hinglish; Attention; Transformers; BERT;
D O I
10.1007/978-3-031-44070-0_3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cyberbullying and hate speech are two of the most significant problems in today's cyberspace. Automated artificial intelligence models might be used to find and remove online hate speech, which would address a critical problem. A variety of explainable AI strategies are being developed to make model judgments and justifications intelligible to people as artificial intelligence continues to permeate numerous industries and make critical change. Our study focuses on mixed code languages (a mix of Hindi and English) and the Indian sub-continent. This language combination is extensively used in SARRAC nations. Three transformer-based models and one machine learning model was trained and fine-tuned on the modified HASOC-Identification of Conversational Hate-Speech in Code-Mixed Languages (ICHCL) data for hate speech classification. Several types of explainability techniques have been explored on the respective models, such as Local interpretable model-agnostic explanations (LIME), Shapley additive explanations (SHAP), and model attention, to analyze model behavior. The analysis suggests that better trained models and comparison of Explainable Artificial Intelligence (XAI) techniques would provide better insight.
引用
收藏
页码:47 / 70
页数:24
相关论文
共 50 条
  • [41] Selecting and combining complementary feature representations and classifiers for hate speech detection
    Cruz, Rafael M. O.
    de Sousa, Woshington V.
    Cavalcanti, George D. C.
    ONLINE SOCIAL NETWORKS AND MEDIA, 2024, 28
  • [42] A review of evaluation approaches for explainable AI with applications in cardiology
    Salih, Ahmed M.
    Galazzo, Ilaria Boscolo
    Gkontra, Polyxeni
    Rauseo, Elisa
    Lee, Aaron Mark
    Lekadir, Karim
    Radeva, Petia
    Petersen, Steffen E.
    Menegaz, Gloria
    ARTIFICIAL INTELLIGENCE REVIEW, 2024, 57 (09)
  • [44] Terrorist-Extremist Speech and Hate Speech: Understanding the Similarities and Differences
    Katharine Gelber
    Ethical Theory and Moral Practice, 2019, 22 : 607 - 622
  • [45] Understanding Polymers Through Transfer Learning and Explainable AI
    Miccio, Luis A.
    APPLIED SCIENCES-BASEL, 2024, 14 (22):
  • [46] Knowledge-Intensive Language Understanding for Explainable AI
    Sheth, Amit
    Gaur, Manas
    Roy, Kaushik
    Faldu, Keyur
    IEEE INTERNET COMPUTING, 2021, 25 (05) : 19 - 24
  • [47] PROTOCOL: Mapping the scientific knowledge and approaches to defining and measuring hate crime, hate speech, and hate incidents
    Vergani, Matteo
    Perry, Barbara
    Freilich, Joshua
    Chermak, Steven
    Scrivens, Ryan
    Link, Rouven
    CAMPBELL SYSTEMATIC REVIEWS, 2022, 18 (02)
  • [48] Social Media Hate Speech Detection Using Explainable Artificial Intelligence (XAI)
    Mehta, Harshkumar
    Passi, Kalpdrum
    ALGORITHMS, 2022, 15 (08)
  • [49] DeepHateExplainer: Explainable Hate Speech Detection in Under-resourced Bengali Language
    Karim, Md Rezaul
    Dey, Sumon Kanti
    Islam, Tanhim
    Sarker, Sagor
    Menon, Mehadi Hasan
    Hossain, Kabir
    Hossain, Md Azam
    Decker, Stefan
    2021 IEEE 8TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA), 2021,
  • [50] Deep Explainable Hate Speech Active Learning on Social-Media Data
    Ahmed, Usman
    Lin, Jerry Chun-Wei
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2024, 11 (04) : 4625 - 4635