TABHATE: A Target-based hate speech detection dataset in Hindi

被引:0
|
作者
Sharma, Deepawali [1 ,2 ]
Singh, Vivek Kumar [3 ]
Gupta, Vedika [4 ]
机构
[1] Banaras Hindu Univ, Dept Comp Sci, Varanasi, India
[2] Bennett Univ, Sch Comp Sci Engn & Technol SCSET, Greater Noida 201310, India
[3] Univ Delhi, Dept Comp Sci, Delhi 110007, India
[4] OP Jindal Global Univ, Jindal Global Business Sch, Sonipat 131001, Haryana, India
关键词
Hate speech; Hate speech corpus; Hate speech dataset; Hindi language; Deep learning;
D O I
10.1007/s13278-024-01355-1
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Social media has become a platform for expressing opinions and emotions, but some people also use it to spread hate, targeting individuals, groups, communities, or countries. Therefore, there is a need to identify such content and take corrective action. During the last few years, several techniques have been developed to automatically detect and identify hate speech, offensive and abusive content from social media platforms. However, majority of the studies focused on hate speech detection in English language texts only. The non-availability of suitable datasets is a major reason for lack of research work in other languages. Hindi is one such widely spoken language where such datasets are not available. This work attempts to bridge this research gap by presenting a curated and annotated dataset for target-based hate speech (TABHATE) in the Hindi language. The suitability of the dataset is explored by applying some standard deep learning and transformer-based models for the task of hate speech detection. The experimental results obtained show that the dataset can be used for experimental work on hate speech detection of Hindi language texts.
引用
收藏
页数:14
相关论文
共 50 条
  • [21] Emotionally Informed Hate Speech Detection: A Multi-target Perspective
    Chiril, Patricia
    Pamungkas, Endang Wahyu
    Benamara, Farah
    Moriceau, Veronique
    Patti, Viviana
    COGNITIVE COMPUTATION, 2022, 14 (01) : 322 - 352
  • [22] Emotionally Informed Hate Speech Detection: A Multi-target Perspective
    Patricia Chiril
    Endang Wahyu Pamungkas
    Farah Benamara
    Véronique Moriceau
    Viviana Patti
    Cognitive Computation, 2022, 14 : 322 - 352
  • [23] Arabic hate speech detection system based on AraBERT
    Higher Institute of Computer, Science and Multimedia of Sfax, sfax, Tunisia
    不详
    Proc. IEEE Int. Conf. Cogn. Informatics Cogn. Comput. ICCI*CC, 2022, (208-213):
  • [24] A lexicon-based approach for hate speech detection
    School of Information Science and Engineering, Central South University, Changsha, China
    不详
    Int. J. Multimedia Ubiquitous Eng., 4 (215-230):
  • [25] A Benchmark Dataset for Learning to Intervene in Online Hate Speech
    Qian, Jing
    Bethke, Anna
    Liu, Yinyin
    Belding, Elizabeth
    Wang, William Yang
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 4755 - 4764
  • [26] A Hierarchically-Labeled Portuguese Hate Speech Dataset
    Fortuna, Paula
    Rocha da Silva, Joao
    Soler-Company, Juan
    Wanner, Leo
    Nunes, Sergio
    THIRD WORKSHOP ON ABUSIVE LANGUAGE ONLINE, 2019, : 94 - 104
  • [27] Understanding hate speech: the HateInsights dataset and model interpretability
    Arshad, Muhammad Umair
    Shahzad, Waseem
    PeerJ Computer Science, 2024, 10
  • [28] Hate Speech Detection in Clubhouse
    Mansourifar, Hadi
    Alsagheer, Dana
    Fathi, Reza
    Shi, Weidong
    Ni, Lan
    Huang, Yan
    MACHINE LEARNING AND PRINCIPLES AND PRACTICE OF KNOWLEDGE DISCOVERY IN DATABASES, PT II, 2021, 1525 : 341 - 351
  • [29] Profanity and hate speech detection
    Teh, Phoey Lee
    Cheng, Chi-Bin
    International Journal of Information and Management Sciences, 2020, 31 (03): : 227 - 246
  • [30] CrowdTarget: Target-based Detection of Crowdturfing in Online Social Networks
    Song, Jonghyuk
    Lee, Sangho
    Kim, Jong
    CCS'15: PROCEEDINGS OF THE 22ND ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, 2015, : 793 - 804