TABHATE: A Target-based hate speech detection dataset in Hindi

被引:0
|
作者
Sharma, Deepawali [1 ,2 ]
Singh, Vivek Kumar [3 ]
Gupta, Vedika [4 ]
机构
[1] Banaras Hindu Univ, Dept Comp Sci, Varanasi, India
[2] Bennett Univ, Sch Comp Sci Engn & Technol SCSET, Greater Noida 201310, India
[3] Univ Delhi, Dept Comp Sci, Delhi 110007, India
[4] OP Jindal Global Univ, Jindal Global Business Sch, Sonipat 131001, Haryana, India
关键词
Hate speech; Hate speech corpus; Hate speech dataset; Hindi language; Deep learning;
D O I
10.1007/s13278-024-01355-1
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Social media has become a platform for expressing opinions and emotions, but some people also use it to spread hate, targeting individuals, groups, communities, or countries. Therefore, there is a need to identify such content and take corrective action. During the last few years, several techniques have been developed to automatically detect and identify hate speech, offensive and abusive content from social media platforms. However, majority of the studies focused on hate speech detection in English language texts only. The non-availability of suitable datasets is a major reason for lack of research work in other languages. Hindi is one such widely spoken language where such datasets are not available. This work attempts to bridge this research gap by presenting a curated and annotated dataset for target-based hate speech (TABHATE) in the Hindi language. The suitability of the dataset is explored by applying some standard deep learning and transformer-based models for the task of hate speech detection. The experimental results obtained show that the dataset can be used for experimental work on hate speech detection of Hindi language texts.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] HateCheckHIn: Evaluating Hindi Hate Speech Detection Models
    Das, Mithun
    Saha, Punyajoy
    Mathew, Binny
    Mukherjee, Animesh
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 5378 - 5387
  • [2] A Turkish Hate Speech Dataset and Detection System
    Beyhan, Fatih
    Carik, Buse
    Arin, Inanc
    Terzioglu, Aysecan
    Yanikoglu, Berrin
    Yeniterzi, Reyyan
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 4177 - 4185
  • [3] HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection
    Mathew, Binny
    Saha, Punyajoy
    Yimam, Seid Muhie
    Biemann, Chris
    Goyal, Pawan
    Mukherjee, Animesh
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 14867 - 14875
  • [4] Towards an Organically Growing Hate Speech Dataset in Hate Speech Detection Systems in a Smart Mobility Application
    Alsamman, Ahmad
    Schmitz, Andreas
    Wimmer, Maria A.
    TOGETHER IN THE UNSTABLE WORLD: DIGITAL GOVERNMENT AND SOLIDARITY, 2023, : 36 - 43
  • [5] Ceasing hate with MoH: Hate Speech Detection in Hindi-English code-switched language
    Sharma, Arushi
    Kabra, Anubha
    Jain, Minni
    INFORMATION PROCESSING & MANAGEMENT, 2022, 59 (01)
  • [6] HHSD: Hindi Hate Speech Detection Leveraging Multi-Task Learning
    Kapil, Prashant
    Kumari, Gitanjali
    Ekbal, Asif
    Pal, Santanu
    Chatterjee, Arindam
    Vinutha, B. N.
    IEEE ACCESS, 2023, 11 : 101460 - 101473
  • [7] A curated dataset for hate speech detection on social media text
    Mody, Devansh
    Huang, YiDong
    de Oliveira, Thiago Eustaquio Alves
    DATA IN BRIEF, 2023, 46
  • [8] ETHOS: a multi-label hate speech detection dataset
    Mollas, Ioannis
    Chrysopoulou, Zoe
    Karlos, Stamatis
    Tsoumakas, Grigorios
    COMPLEX & INTELLIGENT SYSTEMS, 2022, 8 (06) : 4663 - 4678
  • [9] ETHOS: a multi-label hate speech detection dataset
    Ioannis Mollas
    Zoe Chrysopoulou
    Stamatis Karlos
    Grigorios Tsoumakas
    Complex & Intelligent Systems, 2022, 8 : 4663 - 4678
  • [10] Hate Speech Detection in the Indonesian Language: A Dataset and Preliminary Study
    Alfina, Ika
    Mulia, Rio
    Fanany, Mohamad Ivan
    Ekanata, Yudo
    2017 INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER SCIENCE AND INFORMATION SYSTEMS (ICACSIS), 2017, : 233 - 237