A Turkish Hate Speech Dataset and Detection System

被引:0
|
作者
Beyhan, Fatih [1 ,2 ]
Carik, Buse [1 ,2 ]
Arin, Inanc [1 ,2 ]
Terzioglu, Aysecan [3 ]
Yanikoglu, Berrin [1 ,2 ]
Yeniterzi, Reyyan [1 ,2 ]
机构
[1] Sabanci Univ, Fac Engn & Nat Sci, TR-34956 Istanbul, Turkey
[2] Sabanci Univ, Ctr Excellence Data Analyt VERIM, TR-34956 Istanbul, Turkey
[3] Sabanci Univ, Fac Arts & Social Sci, TR-34956 Istanbul, Turkey
关键词
Hate speech detection; Deep learning; Turkish;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Social media posts containing hate speech are reproduced and redistributed at an accelerated pace, reaching greater audiences at a higher speed. We present a machine learning system for automatic detection of hate speech in Turkish, along with a hate speech dataset consisting of tweets collected in two separate domains. We first adopted a definition for hate speech that is in line with our goals and amenable to easy annotation; then designed the annotation schema for annotating the collected tweets. The Istanbul Convention dataset consists of tweets posted following the withdrawal of Turkey from the Istanbul Convention. The Refugees dataset was created by collecting tweets about immigrants by filtering based on commonly used keywords related to immigrants. Finally, we have developed a hate speech detection system using the transformer architecture (BERTurk), to be used as a baseline for the collected dataset. The binary classification accuracy is 77% when the system is evaluated using 5-fold cross validation on the Istanbul Convention dataset and 71% for the Refugee dataset. We also tested a regression model with 0.66 and 0.83 RMSE on a scale of [0-4], for the Istanbul Convention and Refugees datasets.
引用
收藏
页码:4177 / 4185
页数:9
相关论文
共 50 条
  • [31] Hate Speech Detection with Comment Embeddings
    Djuric, Nemanja
    Zhou, Jing
    Morris, Robin
    Grbovic, Mihajlo
    Radosavljevic, Vladan
    Bhamidipati, Narayan
    WWW'15 COMPANION: PROCEEDINGS OF THE 24TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB, 2015, : 29 - 30
  • [32] A New Hate Speech Detection System based on Textual and Psychological Features
    Alkomah, Fatimah
    Salati, Sanaz
    Ma, Xiaogang
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (08) : 860 - 869
  • [33] Detection of Hate Speech using BERT and Hate Speech Word Embedding with Deep Model
    Saleh, Hind
    Alhothali, Areej
    Moria, Kawthar
    APPLIED ARTIFICIAL INTELLIGENCE, 2023, 37 (01)
  • [34] Levantine hate speech detection in twitter
    AbdelHamid, Medyan
    Jafar, Assef
    Rahal, Yasser
    SOCIAL NETWORK ANALYSIS AND MINING, 2022, 12 (01)
  • [35] Hate Speech Detection in Roman Urdu
    Khan, Muhammad Moin
    Shahzad, Khurram
    Malik, Muhammad Kamran
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2021, 20 (01)
  • [36] TOXIGEN: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection
    Hartvigsen, Thomas
    Gabriel, Saadia
    Palangi, Hamid
    Sap, Maarten
    Ray, Dipankar
    Kamar, Ece
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 3309 - 3326
  • [37] A Federated Approach for Hate Speech Detection
    Gala, Jay
    Gandhi, Deep
    Mehta, Jash
    Talat, Zeerak
    17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 3248 - 3259
  • [38] Spanning the Spectrum of Hatred Detection: A Persian Multi-Label Hate Speech Dataset with Annotator Rationales
    Delbari, Zahra
    Moosavi, Nafise Sadat
    Pilehvar, Mohammad Taher
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 16, 2024, : 17889 - 17897
  • [39] FoR: A Dataset for Synthetic Speech Detection
    Reimao, Ricardo
    Tzerpos, Vassilios
    2019 10TH INTERNATIONAL CONFERENCE ON SPEECH TECHNOLOGY AND HUMAN-COMPUTER DIALOGUE (SPED), 2019,
  • [40] Hate Speech is not Free Speech: Explainable Machine Learning for Hate Speech Detection in Code-Mixed Languages
    Yadav, Sargam
    Kaushik, Abhishek
    McDaid, Kevin
    2023 IEEE INTERNATIONAL SYMPOSIUM ON TECHNOLOGY AND SOCIETY, ISTAS, 2023,