A Turkish Hate Speech Dataset and Detection System

被引：0

作者：

Beyhan, Fatih ^{[1
,2
]}

Carik, Buse ^{[1
,2
]}

Arin, Inanc ^{[1
,2
]}

Terzioglu, Aysecan ^{[3
]}

Yanikoglu, Berrin ^{[1
,2
]}

Yeniterzi, Reyyan ^{[1
,2
]}

机构：

[1] Sabanci Univ, Fac Engn & Nat Sci, TR-34956 Istanbul, Turkey

[2] Sabanci Univ, Ctr Excellence Data Analyt VERIM, TR-34956 Istanbul, Turkey

[3] Sabanci Univ, Fac Arts & Social Sci, TR-34956 Istanbul, Turkey

来源：

LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION | 2022年

关键词：

Hate speech detection; Deep learning; Turkish;

D O I：

暂无

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Social media posts containing hate speech are reproduced and redistributed at an accelerated pace, reaching greater audiences at a higher speed. We present a machine learning system for automatic detection of hate speech in Turkish, along with a hate speech dataset consisting of tweets collected in two separate domains. We first adopted a definition for hate speech that is in line with our goals and amenable to easy annotation; then designed the annotation schema for annotating the collected tweets. The Istanbul Convention dataset consists of tweets posted following the withdrawal of Turkey from the Istanbul Convention. The Refugees dataset was created by collecting tweets about immigrants by filtering based on commonly used keywords related to immigrants. Finally, we have developed a hate speech detection system using the transformer architecture (BERTurk), to be used as a baseline for the collected dataset. The binary classification accuracy is 77% when the system is evaluated using 5-fold cross validation on the Istanbul Convention dataset and 71% for the Refugee dataset. We also tested a regression model with 0.66 and 0.83 RMSE on a scale of [0-4], for the Istanbul Convention and Refugees datasets.

引用

页码：4177 / 4185

页数：9

共 50 条

[21] Hate speech detection with ADHAR: a multi-dialectal hate speech corpus in Arabic
Charfi, Anis
Besghaier, Mabrouka
Akasheh, Raghda
Atalla, Andria
Zaghouani, Wajdi
FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2024, 7
[22] Detection of Hate and Offensive Speech in Text
Wani, Abid Hussain
Molvi, Nahida Shafi
Ashraf, Sheikh Ishrah
INTELLIGENT HUMAN COMPUTER INTERACTION (IHCI 2019), 2020, 11886 : 87 - 93
[23] Language Agnostic Hate Speech Detection
Arango, Ayme
PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, : 2475 - 2475
[24] Automated Hate Speech Detection on Twitter
Koushik, Garima
Rajeswari, K.
Muthusamy, Suresh Kannan
2019 5TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION, CONTROL AND AUTOMATION (ICCUBEA), 2019,
[25] Bias in Hate Speech and Toxicity Detection
Lobo, Paula Reyero
PROCEEDINGS OF THE 2022 AAAI/ACM CONFERENCE ON AI, ETHICS, AND SOCIETY, AIES 2022, 2022, : 910 - 910
[26] Hate speech detection: Challenges and solutions
MacAvaney, Sean
Yao, Hao-Ren
Yang, Eugene
Russell, Katina
Goharian, Nazli
Frieder, Ophir
PLOS ONE, 2019, 14 (08):
[27] Levantine hate speech detection in twitter
Medyan AbdelHamid
Assef Jafar
Yasser Rahal
Social Network Analysis and Mining, 2022, 12
[28] Topic Oriented Hate Speech Detection
Jamil, Raihan
Khan, Mohammad Abdullah Al Nayeem
Anwar, Md Musfique
HYBRID INTELLIGENT SYSTEMS, HIS 2021, 2022, 420 : 365 - 375
[29] Constructing ensembles for hate speech detection
Kucukkaya, Izzet Emre
Toraman, Cagri
NATURAL LANGUAGE PROCESSING, 2024,
[30] Enhancing Hate Speech Detection in the Digital Age: A Novel Model Fusion Approach Leveraging a Comprehensive Dataset
Sharif, Waqas
Abdullah, Saima
Iftikhar, Saman
Al-Madani, Daniah
Mumtaz, Shahzad
IEEE ACCESS, 2024, 12 : 27225 - 27236

← 1 2 3 4 5 →