Machine Learning Approach for the Detection of Hate Speech in Sinhala Unicode Text

被引:3
|
作者
Samarasinghe, S. W. A. M. D. [1 ]
Meegama, R. G. N. [1 ]
Punchimudiyanse, M. [2 ]
机构
[1] Univ Sri Jayewardenepura, Fac Appl Sci, Dept Comp Sci, Apple Res & Dev Ctr, Nugegoda, Sri Lanka
[2] Open Univ Sri Lanka, Fac Nat Sci, Dept Math & Comp Sci, Nawala, Sri Lanka
关键词
Convolutional Neural Networks; Word Embedding; N-Gram; Sinhala hate speech;
D O I
10.1109/ICTer51097.2020.9325493
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Hate speech published online platforms has become a critical issue in Sri Lanka since this has caused conflicts between different ethnic groups. One of the main barriers to stop this crime is the lack of resources to detect online hate content in Sinhala automatically. Due to the vast amount of content published on online platforms every minute, an automatic method must be implemented in order to solve this issue. As a solution, we suggest a deep learning mechanism that utilizes two convolution neural networks (CNNs) which will first classify a given text corpus as hateful or not. Then, if the text corpus contains hate content text, it will again be classified according to its hate level which can be used by authorities to make decisions. In order to convert the text data into numerical vectors, we have used FastText word embedding in this study. Results indicate an accuracy of 83% and 60% for hate speech classification and hate level classifications, respectively.
引用
收藏
页码:65 / 70
页数:6
相关论文
共 50 条
  • [1] Sinhala Hate Speech Detection in Social Media using Text Mining and Machine learning
    Sandaruwan, H. M. S. T.
    Lorensuhewa, S. A. S.
    Kalyani, M. A. L.
    [J]. 2019 19TH INTERNATIONAL CONFERENCE ON ADVANCES IN ICT FOR EMERGING REGIONS (ICTER - 2019), 2019,
  • [2] Sinhala Hate Speech Detection in Social Media Using Machine Learning and Deep Learning
    Fernando, W. S. S.
    Weerasinghe, Ruvan
    Bandara, E. R. A. D.
    [J]. 2022 22ND INTERNATIONAL CONFERENCE ON ADVANCES IN ICT FOR EMERGING REGIONS (ICTER), 2022,
  • [3] Hate Speech Detection Using Text Mining and Machine Learning
    Alaoui, Safae Sossi
    Farhaoui, Yousef
    Aksasse, Brahim
    [J]. INTERNATIONAL JOURNAL OF DECISION SUPPORT SYSTEM TECHNOLOGY, 2022, 14 (01)
  • [4] Hate Speech Detection on Indonesian Long Text Documents Using Machine Learning Approach
    Aulia, Nofa
    Budi, Indra
    [J]. ICCAI '19 - PROCEEDINGS OF THE 2019 5TH INTERNATIONAL CONFERENCE ON COMPUTING AND ARTIFICIAL INTELLIGENCE, 2019, : 164 - 169
  • [5] Sinhala Speech to Sinhala Unicode Text Conversion for Disaster Relief Facilitation in Sri Lanka
    Prasangini, Nishadi
    Nagahamulla, Harshani
    [J]. 2018 IEEE 9TH INTERNATIONAL CONFERENCE ON INFORMATION AND AUTOMATION FOR SUSTAINABILITY (ICIAFS' 2018), 2018,
  • [6] Improving Sinhala Hate Speech Detection Using Deep Learning
    Gamage, Kavishka
    Welgama, Viraj
    Weerasinghe, Ruvan
    [J]. 2022 22ND INTERNATIONAL CONFERENCE ON ADVANCES IN ICT FOR EMERGING REGIONS (ICTER), 2022,
  • [7] Real-time Translation of Discrete Sinhala Speech to Unicode Text
    Gunasekara, M. K. H.
    Meegama, R. G. N.
    [J]. 2015 FIFTEENTH INTERNATIONAL CONFERENCE ON ADVANCES IN ICT FOR EMERGING REGIONS (ICTER), 2015, : 140 - 145
  • [8] Intelligent detection of hate speech in Arabic social network: A machine learning approach
    Aljarah, Ibrahim
    Habib, Maria
    Hijazi, Neveen
    Faris, Hossam
    Qaddoura, Raneem
    Hammo, Bassam
    Abushariah, Mohammad
    Alfawareh, Mohammad
    [J]. JOURNAL OF INFORMATION SCIENCE, 2021, 47 (04) : 483 - 501
  • [9] Twitter Hate Speech Detection using Machine Learning
    Janardhan, G.
    Saikiran, Bollu
    Reddy, Inugala Swanith
    Abhishek, Mogilicherla
    [J]. 2024 4TH INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING AND SOCIAL NETWORKING, ICPCSN 2024, 2024, : 270 - 278
  • [10] Automated Text Annotation Using a Semi-Supervised Approach with Meta Vectorizer and Machine Learning Algorithms for Hate Speech Detection
    Saifullah, Shoffan
    Drezewski, Rafal
    Dwiyanto, Felix Andika
    Aribowo, Agus Sasmito
    Fauziah, Yuli
    Cahyana, Nur Heri
    [J]. APPLIED SCIENCES-BASEL, 2024, 14 (03):