Improving Hate Speech Detection Accuracy using Hybrid CNN-RNN and Random Oversampling Techniques

被引:0
|
作者
Riyadi, Slamet [1 ]
Andriyani, Annisa Divayu [1 ]
Masyhur, Ahmad Musthafa [1 ]
机构
[1] Univ Muhammadiyah Yogyakarta, Dept Informat Technol, Yogyakarta, Indonesia
关键词
hate speech; Twitter; hybrid CNN-RNN; balancing dataset; oversampling;
D O I
10.1109/ISIEA61920.2024.10607232
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Detecting hate speech is crucial for addressing online toxicity and fostering a secure digital environment. This study aims to enhance the efficiency of hybrid CNN-RNN models, commonly used for this task, by improving accuracy. By integrating oversampling techniques with the model, the research aims to better categorize instances of hate speech, particularly in imbalanced datasets. The dataset used in this study is the Indonesian Tweet Hate Speech dataset. Following established protocols, including data pre-processing, training, and testing, significant improvements in accuracy are observed. The hybrid CNN-RNN achieves 0.827 accuracy, 0.797 precision, 0.759 recall, and 0.883 F1 score with imbalanced data. The model performs even better with balanced data, reaching 0.908 accuracy, 0.943 precision, 0.894 recall, and 0.914 F1 score. Notably, the proposed model outperforms the standard hybrid CNN-RNN on imbalanced datasets, with an accuracy of 0.752, precision of 0.797, recall of 0.559, and F1 score of 0.657. Techniques like dropout and early termination mitigate overfitting in complex models and large datasets. This research contributes to hate speech detection methods, underscoring the hybrid CNN-RNN's efficacy in handling imbalanced data, while future studies could explore additional methodologies for further enhancements.
引用
下载
收藏
页数:5
相关论文
共 47 条