Hate Speech Detection in Bahasa Indonesia: Challenges and Opportunities

被引:0
|
作者
Pamungkas, Endang Wahyu [1 ,3 ]
Putri, Divi Galih Prasetyo [2 ]
Fatmawati, Azizah [1 ]
机构
[1] Univ Muhammadiyah Surakarta, Informat Engn Dept, Surakarta, Indonesia
[2] Univ Gadjah Mada, Vocat Sch, Software Engn Dept, Yogyakarta, Indonesia
[3] Univ Muhammadiyah Surakarta, Social Informat Res Ctr, Surakarta, Indonesia
关键词
Abusive language; hate speech detection; machine learning; social media;
D O I
10.14569/IJACSA.2023.01406125
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The study aims to provide an overview of the current research on detecting abusive language in Indonesian social media. The study examines existing datasets, methods, and challenges and opportunities in this field. The research found that most existing datasets for detecting abusive language were col-lected from social media platforms such as Twitter, Facebook, and Instagram, with Twitter being the most commonly used source. The study also found that hate speech is the most researched type of abusive language. Various models, including traditional machine learning and deep learning approaches, have been im-plemented for this task, with deep learning models showing more competitive results. However, the use of transformer-based models is less popular in Indonesian hate speech studies. The study also emphasizes the importance of exploring more diverse phenomena, such as islamophobia and political hate speech. Additionally, the study suggests crowdsourcing as a potential solution for the annotation approach for labeling datasets. Furthermore, it encourages researchers to consider code-mixing issues in abusive language datasets in Indonesia, as it could improve the overall model performance for detecting abusive language in Indonesian data. The study also suggests that the lack of effective regulations and the anonymity afforded to users on most social networking sites, as well as the increasing number of Twitter users in Indonesia, have contributed to the rising prevalence of hate speech in Indonesian social media. The study also notes the importance of considering code-mixed language, out-of-vocabulary words, grammatical errors, and limited context when working with social media data.
引用
收藏
页码:1175 / 1181
页数:7
相关论文
共 50 条
  • [1] Hate speech detection: Challenges and solutions
    MacAvaney, Sean
    Yao, Hao-Ren
    Yang, Eugene
    Russell, Katina
    Goharian, Nazli
    Frieder, Ophir
    PLOS ONE, 2019, 14 (08):
  • [2] Twitter Hate Speech Detection: A Systematic Review of Methods, Taxonomy Analysis, Challenges, and Opportunities
    Mansur, Zainab
    Omar, Nazlia
    Tiun, Sabrina
    IEEE ACCESS, 2023, 11 : 16226 - 16249
  • [3] Indonesia Hate Speech Detection using Deep Learning
    Sutejo, Taufic Leonardo
    Lestari, Dessi Puji
    2018 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2018, : 39 - 43
  • [4] Speech to Text of Patient Complaints for Bahasa Indonesia
    Laksono, Teguh Puji
    Hidayatullah, Ahmad Fathan
    Ratnasari, Chanifah Indah
    2018 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2018, : 79 - 84
  • [5] Logical framework for hate speech detection on religion issues in Indonesia
    Darmalaksana, W.
    Irwansyah, F. S.
    Sugilar, H.
    Maylawati, D. S.
    Azis, W. D., I
    Rahman, A.
    5TH ANNUAL APPLIED SCIENCE AND ENGINEERING CONFERENCE (AASEC 2020), 2021, 1098
  • [6] Development of Under-Resourced Bahasa Indonesia Speech Corpus
    Cahyaningtyas, Elok
    Arifianto, Dhany
    2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC 2017), 2017, : 1097 - 1101
  • [7] The Datafication of Hate: Expectations and Challenges in Automated Hate Speech Monitoring
    Laaksonen, Salla-Maaria
    Haapoja, Jesse
    Kinnunen, Teemu
    Nelimarkka, Matti
    Poyhtari, Reeta
    FRONTIERS IN BIG DATA, 2020, 3
  • [8] Tesaurus bahasa Indonesia
    Van Minde, Don
    BIJDRAGEN TOT DE TAAL- LAND- EN VOLKENKUNDE, 2007, 163 (04): : 591 - 593
  • [9] Hate Speech Detection in Clubhouse
    Mansourifar, Hadi
    Alsagheer, Dana
    Fathi, Reza
    Shi, Weidong
    Ni, Lan
    Huang, Yan
    MACHINE LEARNING AND PRINCIPLES AND PRACTICE OF KNOWLEDGE DISCOVERY IN DATABASES, PT II, 2021, 1525 : 341 - 351
  • [10] Profanity and hate speech detection
    Teh, Phoey Lee
    Cheng, Chi-Bin
    International Journal of Information and Management Sciences, 2020, 31 (03): : 227 - 246