TABHATE: A Target-based hate speech detection dataset in Hindi

被引:0
|
作者
Sharma, Deepawali [1 ,2 ]
Singh, Vivek Kumar [3 ]
Gupta, Vedika [4 ]
机构
[1] Banaras Hindu Univ, Dept Comp Sci, Varanasi, India
[2] Bennett Univ, Sch Comp Sci Engn & Technol SCSET, Greater Noida 201310, India
[3] Univ Delhi, Dept Comp Sci, Delhi 110007, India
[4] OP Jindal Global Univ, Jindal Global Business Sch, Sonipat 131001, Haryana, India
关键词
Hate speech; Hate speech corpus; Hate speech dataset; Hindi language; Deep learning;
D O I
10.1007/s13278-024-01355-1
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Social media has become a platform for expressing opinions and emotions, but some people also use it to spread hate, targeting individuals, groups, communities, or countries. Therefore, there is a need to identify such content and take corrective action. During the last few years, several techniques have been developed to automatically detect and identify hate speech, offensive and abusive content from social media platforms. However, majority of the studies focused on hate speech detection in English language texts only. The non-availability of suitable datasets is a major reason for lack of research work in other languages. Hindi is one such widely spoken language where such datasets are not available. This work attempts to bridge this research gap by presenting a curated and annotated dataset for target-based hate speech (TABHATE) in the Hindi language. The suitability of the dataset is explored by applying some standard deep learning and transformer-based models for the task of hate speech detection. The experimental results obtained show that the dataset can be used for experimental work on hate speech detection of Hindi language texts.
引用
收藏
页数:14
相关论文
共 50 条
  • [31] Studies in target-based treatment
    Kurzrock, Razelle
    MOLECULAR CANCER THERAPEUTICS, 2007, 6 (09) : 2385 - 2385
  • [32] Multilingual Hate Speech Detection: Innovations in Optimized Deep Learning for English and Arabic Hate Speech Detection
    Hassan AL-Sukhani
    Qusay Bsoul
    Abdelrahman H. Elhawary
    Ziad M. Nasr
    Ahmed E. Mansour
    Radwan M. Batyha
    Basma S. Alqadi
    Jehad Saad Alqurni
    Hayat Alfagham
    Magda M. Madbouly
    SN Computer Science, 6 (3)
  • [33] The utility of target-based discovery
    Croston, Glenn E.
    EXPERT OPINION ON DRUG DISCOVERY, 2017, 12 (05) : 427 - 429
  • [34] Studies in target-based treatment
    Kurnock, Razelle
    MOLECULAR CANCER THERAPEUTICS, 2007, 6 (05) : 1477 - 1477
  • [35] BERT-based Ensemble Approaches for Hate Speech Detection
    Mnassri, Khouloud
    Rajapaksha, Praboda
    Farahbakhsh, Reza
    Crespi, Noel
    2022 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM 2022), 2022, : 4649 - 4654
  • [36] Offensive Language and Hate Speech Detection Based on Transfer Learning
    Touahri, Ibtissam
    Mazroui, Azzeddine
    ADVANCED INTELLIGENT SYSTEMS FOR SUSTAINABLE DEVELOPMENT (AI2SD'2020), VOL 2, 2022, 1418 : 300 - 311
  • [37] Deep Learning Based Fusion Approach for Hate Speech Detection
    Zhou, Yanling
    Yang, Yanyan
    Liu, Han
    Liu, Xiufeng
    Savage, Nick
    IEEE ACCESS, 2020, 8 : 128923 - 128929
  • [38] IndicCONAN: A Multilingual Dataset for Combating Hate Speech in Indian Context
    Sahoo, Nihar Ranja
    Beria, Gyana Prakash
    Bhattacharyya, Pushpak
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 20, 2024, : 22313 - 22321
  • [39] T-HSAB: A Tunisian Hate Speech and Abusive Dataset
    Haddad, Hatem
    Mulki, Hala
    Oueslati, Asma
    ARABIC LANGUAGE PROCESSING: FROM THEORY TO PRACTICE, ICALP 2019, 2019, 1108 : 251 - 263
  • [40] Hate speech detection with ADHAR: a multi-dialectal hate speech corpus in Arabic
    Charfi, Anis
    Besghaier, Mabrouka
    Akasheh, Raghda
    Atalla, Andria
    Zaghouani, Wajdi
    FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2024, 7