TABHATE: A Target-based hate speech detection dataset in Hindi

被引:0
|
作者
Sharma, Deepawali [1 ,2 ]
Singh, Vivek Kumar [3 ]
Gupta, Vedika [4 ]
机构
[1] Banaras Hindu Univ, Dept Comp Sci, Varanasi, India
[2] Bennett Univ, Sch Comp Sci Engn & Technol SCSET, Greater Noida 201310, India
[3] Univ Delhi, Dept Comp Sci, Delhi 110007, India
[4] OP Jindal Global Univ, Jindal Global Business Sch, Sonipat 131001, Haryana, India
关键词
Hate speech; Hate speech corpus; Hate speech dataset; Hindi language; Deep learning;
D O I
10.1007/s13278-024-01355-1
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Social media has become a platform for expressing opinions and emotions, but some people also use it to spread hate, targeting individuals, groups, communities, or countries. Therefore, there is a need to identify such content and take corrective action. During the last few years, several techniques have been developed to automatically detect and identify hate speech, offensive and abusive content from social media platforms. However, majority of the studies focused on hate speech detection in English language texts only. The non-availability of suitable datasets is a major reason for lack of research work in other languages. Hindi is one such widely spoken language where such datasets are not available. This work attempts to bridge this research gap by presenting a curated and annotated dataset for target-based hate speech (TABHATE) in the Hindi language. The suitability of the dataset is explored by applying some standard deep learning and transformer-based models for the task of hate speech detection. The experimental results obtained show that the dataset can be used for experimental work on hate speech detection of Hindi language texts.
引用
收藏
页数:14
相关论文
共 50 条
  • [41] Detection of Hate and Offensive Speech in Text
    Wani, Abid Hussain
    Molvi, Nahida Shafi
    Ashraf, Sheikh Ishrah
    INTELLIGENT HUMAN COMPUTER INTERACTION (IHCI 2019), 2020, 11886 : 87 - 93
  • [42] Language Agnostic Hate Speech Detection
    Arango, Ayme
    PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, : 2475 - 2475
  • [43] Hate Speech and Target Community Detection in Nastaliq Urdu Using Transfer Learning Techniques
    Malik, Muhammad Shahid Iqbal
    Nawaz, Aftab
    Jamjoom, Mona Mamdouh
    IEEE ACCESS, 2024, 12 : 116875 - 116890
  • [44] Automated Hate Speech Detection on Twitter
    Koushik, Garima
    Rajeswari, K.
    Muthusamy, Suresh Kannan
    2019 5TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION, CONTROL AND AUTOMATION (ICCUBEA), 2019,
  • [45] Bias in Hate Speech and Toxicity Detection
    Lobo, Paula Reyero
    PROCEEDINGS OF THE 2022 AAAI/ACM CONFERENCE ON AI, ETHICS, AND SOCIETY, AIES 2022, 2022, : 910 - 910
  • [46] Hate speech detection: Challenges and solutions
    MacAvaney, Sean
    Yao, Hao-Ren
    Yang, Eugene
    Russell, Katina
    Goharian, Nazli
    Frieder, Ophir
    PLOS ONE, 2019, 14 (08):
  • [47] Levantine hate speech detection in twitter
    Medyan AbdelHamid
    Assef Jafar
    Yasser Rahal
    Social Network Analysis and Mining, 2022, 12
  • [48] Topic Oriented Hate Speech Detection
    Jamil, Raihan
    Khan, Mohammad Abdullah Al Nayeem
    Anwar, Md Musfique
    HYBRID INTELLIGENT SYSTEMS, HIS 2021, 2022, 420 : 365 - 375
  • [49] Constructing ensembles for hate speech detection
    Kucukkaya, Izzet Emre
    Toraman, Cagri
    NATURAL LANGUAGE PROCESSING, 2024,
  • [50] Enhancing Hate Speech Detection in the Digital Age: A Novel Model Fusion Approach Leveraging a Comprehensive Dataset
    Sharif, Waqas
    Abdullah, Saima
    Iftikhar, Saman
    Al-Madani, Daniah
    Mumtaz, Shahzad
    IEEE ACCESS, 2024, 12 : 27225 - 27236