Understanding Interpretability: Explainable AI Approaches for Hate Speech Classifiers

被引:1
|
作者
Yadav, Sargam [1 ]
Kaushik, Abhishek [1 ]
McDaid, Kevin [1 ]
机构
[1] Dundalk Inst Technol, Dundalk, Ireland
关键词
explainable artificial intelligence; hate speech; LIME; SHAP; sentiment analysis; Hinglish; Attention; Transformers; BERT;
D O I
10.1007/978-3-031-44070-0_3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cyberbullying and hate speech are two of the most significant problems in today's cyberspace. Automated artificial intelligence models might be used to find and remove online hate speech, which would address a critical problem. A variety of explainable AI strategies are being developed to make model judgments and justifications intelligible to people as artificial intelligence continues to permeate numerous industries and make critical change. Our study focuses on mixed code languages (a mix of Hindi and English) and the Indian sub-continent. This language combination is extensively used in SARRAC nations. Three transformer-based models and one machine learning model was trained and fine-tuned on the modified HASOC-Identification of Conversational Hate-Speech in Code-Mixed Languages (ICHCL) data for hate speech classification. Several types of explainability techniques have been explored on the respective models, such as Local interpretable model-agnostic explanations (LIME), Shapley additive explanations (SHAP), and model attention, to analyze model behavior. The analysis suggests that better trained models and comparison of Explainable Artificial Intelligence (XAI) techniques would provide better insight.
引用
收藏
页码:47 / 70
页数:24
相关论文
共 50 条
  • [31] Supervised Classifiers to Identify Hate Speech on English and Spanish Tweets
    Almatarneh, Sattam
    Gamallo, Pablo
    Ribadas Pena, Francisco J.
    Alexeev, Alexey
    DIGITAL LIBRARIES AT THE CROSSROADS OF DIGITAL INFORMATION FOR THE FUTURE, ICADL 2019, 2019, 11853 : 23 - 30
  • [32] Enhancing the Interpretability of Malaria and Typhoid Diagnosis with Explainable AI and Large Language Models
    Attai, Kingsley
    Ekpenyong, Moses
    Amannah, Constance
    Asuquo, Daniel
    Ajuga, Peterben
    Obot, Okure
    Johnson, Ekemini
    John, Anietie
    Maduka, Omosivie
    Akwaowo, Christie
    Uzoka, Faith-Michael
    TROPICAL MEDICINE AND INFECTIOUS DISEASE, 2024, 9 (09)
  • [34] ExplainableFold: Understanding AlphaFold Prediction with Explainable AI
    Tan, Juntao
    Zhang, Yongfeng
    PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023, 2023, : 2166 - 2176
  • [35] Predicting and Understanding Landslide Events With Explainable AI
    Collini, Enrico
    Palesi, L. A. Ipsaro
    Nesi, Paolo
    Pantaleo, Gianni
    Nocentini, Nicola
    Rosi, Ascanio
    IEEE ACCESS, 2022, 10 : 31175 - 31189
  • [36] Ethical and technical challenges of AI in tackling hate speech
    Cortiz, Diogo
    Zubiaga, Arkaitz
    INTERNATIONAL REVIEW OF INFORMATION ETHICS, 2021, 29
  • [37] HARE: Explainable Hate Speech Detection with Step-by-Step Reasoning
    Yang, Yongjin
    Kim, Joonkee
    Kim, Yujin
    Ho, Namgyu
    Thorne, James
    Yun, Se-young
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 5490 - 5505
  • [38] Certified Logic-Based Explainable AI - The Case of Monotonic Classifiers
    Hurault, Aurelie
    Marques-Silva, Joao
    TESTS AND PROOFS, TAP 2023, 2023, 14066 : 51 - 67
  • [39] Rule By Example: Harnessing Logical Rules for Explainable Hate Speech Detection
    Clarke, Christopher
    Hall, Matthew
    Mittal, Gaurav
    Yu, Ye
    Sajeev, Sandra
    Mars, Jason
    Chen, Mei
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 364 - 376
  • [40] Oppositional Religious Speech: Understanding Hate Preaching
    Edge, Peter W.
    ECCLESIASTICAL LAW JOURNAL, 2018, 20 (03) : 278 - 289