A Turkish Hate Speech Dataset and Detection System

被引:0
|
作者
Beyhan, Fatih [1 ,2 ]
Carik, Buse [1 ,2 ]
Arin, Inanc [1 ,2 ]
Terzioglu, Aysecan [3 ]
Yanikoglu, Berrin [1 ,2 ]
Yeniterzi, Reyyan [1 ,2 ]
机构
[1] Sabanci Univ, Fac Engn & Nat Sci, TR-34956 Istanbul, Turkey
[2] Sabanci Univ, Ctr Excellence Data Analyt VERIM, TR-34956 Istanbul, Turkey
[3] Sabanci Univ, Fac Arts & Social Sci, TR-34956 Istanbul, Turkey
关键词
Hate speech detection; Deep learning; Turkish;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Social media posts containing hate speech are reproduced and redistributed at an accelerated pace, reaching greater audiences at a higher speed. We present a machine learning system for automatic detection of hate speech in Turkish, along with a hate speech dataset consisting of tweets collected in two separate domains. We first adopted a definition for hate speech that is in line with our goals and amenable to easy annotation; then designed the annotation schema for annotating the collected tweets. The Istanbul Convention dataset consists of tweets posted following the withdrawal of Turkey from the Istanbul Convention. The Refugees dataset was created by collecting tweets about immigrants by filtering based on commonly used keywords related to immigrants. Finally, we have developed a hate speech detection system using the transformer architecture (BERTurk), to be used as a baseline for the collected dataset. The binary classification accuracy is 77% when the system is evaluated using 5-fold cross validation on the Istanbul Convention dataset and 71% for the Refugee dataset. We also tested a regression model with 0.66 and 0.83 RMSE on a scale of [0-4], for the Istanbul Convention and Refugees datasets.
引用
收藏
页码:4177 / 4185
页数:9
相关论文
共 50 条
  • [21] Hate speech detection with ADHAR: a multi-dialectal hate speech corpus in Arabic
    Charfi, Anis
    Besghaier, Mabrouka
    Akasheh, Raghda
    Atalla, Andria
    Zaghouani, Wajdi
    FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2024, 7
  • [22] Detection of Hate and Offensive Speech in Text
    Wani, Abid Hussain
    Molvi, Nahida Shafi
    Ashraf, Sheikh Ishrah
    INTELLIGENT HUMAN COMPUTER INTERACTION (IHCI 2019), 2020, 11886 : 87 - 93
  • [23] Language Agnostic Hate Speech Detection
    Arango, Ayme
    PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, : 2475 - 2475
  • [24] Automated Hate Speech Detection on Twitter
    Koushik, Garima
    Rajeswari, K.
    Muthusamy, Suresh Kannan
    2019 5TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION, CONTROL AND AUTOMATION (ICCUBEA), 2019,
  • [25] Bias in Hate Speech and Toxicity Detection
    Lobo, Paula Reyero
    PROCEEDINGS OF THE 2022 AAAI/ACM CONFERENCE ON AI, ETHICS, AND SOCIETY, AIES 2022, 2022, : 910 - 910
  • [26] Hate speech detection: Challenges and solutions
    MacAvaney, Sean
    Yao, Hao-Ren
    Yang, Eugene
    Russell, Katina
    Goharian, Nazli
    Frieder, Ophir
    PLOS ONE, 2019, 14 (08):
  • [27] Levantine hate speech detection in twitter
    Medyan AbdelHamid
    Assef Jafar
    Yasser Rahal
    Social Network Analysis and Mining, 2022, 12
  • [28] Topic Oriented Hate Speech Detection
    Jamil, Raihan
    Khan, Mohammad Abdullah Al Nayeem
    Anwar, Md Musfique
    HYBRID INTELLIGENT SYSTEMS, HIS 2021, 2022, 420 : 365 - 375
  • [29] Constructing ensembles for hate speech detection
    Kucukkaya, Izzet Emre
    Toraman, Cagri
    NATURAL LANGUAGE PROCESSING, 2024,
  • [30] Enhancing Hate Speech Detection in the Digital Age: A Novel Model Fusion Approach Leveraging a Comprehensive Dataset
    Sharif, Waqas
    Abdullah, Saima
    Iftikhar, Saman
    Al-Madani, Daniah
    Mumtaz, Shahzad
    IEEE ACCESS, 2024, 12 : 27225 - 27236