TABHATE: A Target-based hate speech detection dataset in Hindi

被引：0

作者：

Sharma, Deepawali ^{[1
,2
]}

Singh, Vivek Kumar ^{[3
]}

Gupta, Vedika ^{[4
]}

机构：

[1] Banaras Hindu Univ, Dept Comp Sci, Varanasi, India

[2] Bennett Univ, Sch Comp Sci Engn & Technol SCSET, Greater Noida 201310, India

[3] Univ Delhi, Dept Comp Sci, Delhi 110007, India

[4] OP Jindal Global Univ, Jindal Global Business Sch, Sonipat 131001, Haryana, India

来源：

SOCIAL NETWORK ANALYSIS AND MINING | 2024年 / 14卷 / 01期

关键词：

Hate speech; Hate speech corpus; Hate speech dataset; Hindi language; Deep learning;

D O I：

10.1007/s13278-024-01355-1

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Social media has become a platform for expressing opinions and emotions, but some people also use it to spread hate, targeting individuals, groups, communities, or countries. Therefore, there is a need to identify such content and take corrective action. During the last few years, several techniques have been developed to automatically detect and identify hate speech, offensive and abusive content from social media platforms. However, majority of the studies focused on hate speech detection in English language texts only. The non-availability of suitable datasets is a major reason for lack of research work in other languages. Hindi is one such widely spoken language where such datasets are not available. This work attempts to bridge this research gap by presenting a curated and annotated dataset for target-based hate speech (TABHATE) in the Hindi language. The suitability of the dataset is explored by applying some standard deep learning and transformer-based models for the task of hate speech detection. The experimental results obtained show that the dataset can be used for experimental work on hate speech detection of Hindi language texts.

引用

页数：14

共 50 条

[1] HateCheckHIn: Evaluating Hindi Hate Speech Detection Models
Das, Mithun
Saha, Punyajoy
Mathew, Binny
Mukherjee, Animesh
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 5378 - 5387
[2] A Turkish Hate Speech Dataset and Detection System
Beyhan, Fatih
Carik, Buse
Arin, Inanc
Terzioglu, Aysecan
Yanikoglu, Berrin
Yeniterzi, Reyyan
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 4177 - 4185
[3] HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection
Mathew, Binny
Saha, Punyajoy
Yimam, Seid Muhie
Biemann, Chris
Goyal, Pawan
Mukherjee, Animesh
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 14867 - 14875
[4] Towards an Organically Growing Hate Speech Dataset in Hate Speech Detection Systems in a Smart Mobility Application
Alsamman, Ahmad
Schmitz, Andreas
Wimmer, Maria A.
TOGETHER IN THE UNSTABLE WORLD: DIGITAL GOVERNMENT AND SOLIDARITY, 2023, : 36 - 43
[5] Ceasing hate with MoH: Hate Speech Detection in Hindi-English code-switched language
Sharma, Arushi
Kabra, Anubha
Jain, Minni
INFORMATION PROCESSING & MANAGEMENT, 2022, 59 (01)
[6] HHSD: Hindi Hate Speech Detection Leveraging Multi-Task Learning
Kapil, Prashant
Kumari, Gitanjali
Ekbal, Asif
Pal, Santanu
Chatterjee, Arindam
Vinutha, B. N.
IEEE ACCESS, 2023, 11 : 101460 - 101473
[7] A curated dataset for hate speech detection on social media text
Mody, Devansh
Huang, YiDong
de Oliveira, Thiago Eustaquio Alves
DATA IN BRIEF, 2023, 46
[8] ETHOS: a multi-label hate speech detection dataset
Mollas, Ioannis
Chrysopoulou, Zoe
Karlos, Stamatis
Tsoumakas, Grigorios
COMPLEX & INTELLIGENT SYSTEMS, 2022, 8 (06) : 4663 - 4678
[9] ETHOS: a multi-label hate speech detection dataset
Ioannis Mollas
Zoe Chrysopoulou
Stamatis Karlos
Grigorios Tsoumakas
Complex & Intelligent Systems, 2022, 8 : 4663 - 4678
[10] Hate Speech Detection in the Indonesian Language: A Dataset and Preliminary Study
Alfina, Ika
Mulia, Rio
Fanany, Mohamad Ivan
Ekanata, Yudo
2017 INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER SCIENCE AND INFORMATION SYSTEMS (ICACSIS), 2017, : 233 - 237

← 1 2 3 4 5 →