A Turkish Hate Speech Dataset and Detection System

被引：0

作者：

Beyhan, Fatih ^{[1
,2
]}

Carik, Buse ^{[1
,2
]}

Arin, Inanc ^{[1
,2
]}

Terzioglu, Aysecan ^{[3
]}

Yanikoglu, Berrin ^{[1
,2
]}

Yeniterzi, Reyyan ^{[1
,2
]}

机构：

[1] Sabanci Univ, Fac Engn & Nat Sci, TR-34956 Istanbul, Turkey

[2] Sabanci Univ, Ctr Excellence Data Analyt VERIM, TR-34956 Istanbul, Turkey

[3] Sabanci Univ, Fac Arts & Social Sci, TR-34956 Istanbul, Turkey

来源：

LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION | 2022年

关键词：

Hate speech detection; Deep learning; Turkish;

D O I：

暂无

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Social media posts containing hate speech are reproduced and redistributed at an accelerated pace, reaching greater audiences at a higher speed. We present a machine learning system for automatic detection of hate speech in Turkish, along with a hate speech dataset consisting of tweets collected in two separate domains. We first adopted a definition for hate speech that is in line with our goals and amenable to easy annotation; then designed the annotation schema for annotating the collected tweets. The Istanbul Convention dataset consists of tweets posted following the withdrawal of Turkey from the Istanbul Convention. The Refugees dataset was created by collecting tweets about immigrants by filtering based on commonly used keywords related to immigrants. Finally, we have developed a hate speech detection system using the transformer architecture (BERTurk), to be used as a baseline for the collected dataset. The binary classification accuracy is 77% when the system is evaluated using 5-fold cross validation on the Istanbul Convention dataset and 71% for the Refugee dataset. We also tested a regression model with 0.66 and 0.83 RMSE on a scale of [0-4], for the Istanbul Convention and Refugees datasets.

引用

下载

页码：4177 / 4185

页数：9

共 50 条

[1] Annotation System to Build Cyberbullying and Hate Speech Detection Model Training Dataset
Febriana, Trisna
Budiarto, Arif
CHIUXID 2020: 6TH INTERNATIONAL ACM IN-COOPERATION HCI AND UX CONFERENCE, 2020, : 29 - 30
[2] HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection
Mathew, Binny
Saha, Punyajoy
Yimam, Seid Muhie
Biemann, Chris
Goyal, Pawan
Mukherjee, Animesh
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 14867 - 14875
[3] Towards an Organically Growing Hate Speech Dataset in Hate Speech Detection Systems in a Smart Mobility Application
Alsamman, Ahmad
Schmitz, Andreas
Wimmer, Maria A.
TOGETHER IN THE UNSTABLE WORLD: DIGITAL GOVERNMENT AND SOLIDARITY, 2023, : 36 - 43
[4] A curated dataset for hate speech detection on social media text
Mody, Devansh
Huang, YiDong
de Oliveira, Thiago Eustaquio Alves
DATA IN BRIEF, 2023, 46
[5] ETHOS: a multi-label hate speech detection dataset
Mollas, Ioannis
Chrysopoulou, Zoe
Karlos, Stamatis
Tsoumakas, Grigorios
COMPLEX & INTELLIGENT SYSTEMS, 2022, 8 (06) : 4663 - 4678
[6] ETHOS: a multi-label hate speech detection dataset
Ioannis Mollas
Zoe Chrysopoulou
Stamatis Karlos
Grigorios Tsoumakas
Complex & Intelligent Systems, 2022, 8 : 4663 - 4678
[7] Hate Speech Detection in the Indonesian Language: A Dataset and Preliminary Study
Alfina, Ika
Mulia, Rio
Fanany, Mohamad Ivan
Ekanata, Yudo
2017 INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER SCIENCE AND INFORMATION SYSTEMS (ICACSIS), 2017, : 233 - 237
[8] TABHATE: A Target-based hate speech detection dataset in Hindi
Sharma, Deepawali
Singh, Vivek Kumar
Gupta, Vedika
SOCIAL NETWORK ANALYSIS AND MINING, 2024, 14 (01)
[9] Arabic hate speech detection system based on AraBERT
Higher Institute of Computer, Science and Multimedia of Sfax, sfax, Tunisia
不详
Proc. IEEE Int. Conf. Cogn. Informatics Cogn. Comput. ICCI*CC, 2022, (208-213):
[10] YouTube based religious hate speech and extremism detection dataset with machine learning baselines
Ashraf, Noman
Rafiq, Abid
Butt, Sabur
Shehzad, Hafiz Muhammad Faisal
Sidorov, Grigori
Gelbukh, Alexander
JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 42 (05) : 4769 - 4777

← 1 2 3 4 5 →