Improving Hate Speech Detection Accuracy using Hybrid CNN-RNN and Random Oversampling Techniques

被引:0
|
作者
Riyadi, Slamet [1 ]
Andriyani, Annisa Divayu [1 ]
Masyhur, Ahmad Musthafa [1 ]
机构
[1] Univ Muhammadiyah Yogyakarta, Dept Informat Technol, Yogyakarta, Indonesia
关键词
hate speech; Twitter; hybrid CNN-RNN; balancing dataset; oversampling;
D O I
10.1109/ISIEA61920.2024.10607232
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Detecting hate speech is crucial for addressing online toxicity and fostering a secure digital environment. This study aims to enhance the efficiency of hybrid CNN-RNN models, commonly used for this task, by improving accuracy. By integrating oversampling techniques with the model, the research aims to better categorize instances of hate speech, particularly in imbalanced datasets. The dataset used in this study is the Indonesian Tweet Hate Speech dataset. Following established protocols, including data pre-processing, training, and testing, significant improvements in accuracy are observed. The hybrid CNN-RNN achieves 0.827 accuracy, 0.797 precision, 0.759 recall, and 0.883 F1 score with imbalanced data. The model performs even better with balanced data, reaching 0.908 accuracy, 0.943 precision, 0.894 recall, and 0.914 F1 score. Notably, the proposed model outperforms the standard hybrid CNN-RNN on imbalanced datasets, with an accuracy of 0.752, precision of 0.797, recall of 0.559, and F1 score of 0.657. Techniques like dropout and early termination mitigate overfitting in complex models and large datasets. This research contributes to hate speech detection methods, underscoring the hybrid CNN-RNN's efficacy in handling imbalanced data, while future studies could explore additional methodologies for further enhancements.
引用
下载
收藏
页数:5
相关论文
共 47 条
  • [1] Improving Hate Speech Detection Using Double-Layers Hybrid CNN-RNN Model on Imbalanced Dataset
    Riyadi, Slamet
    Divayu Andriyani, Annisa
    Noraini Sulaiman, Siti
    IEEE Access, 2024, 12 : 159660 - 159668
  • [2] Improving CNN-RNN Hybrid Networks for Handwriting Recognition
    Dutta, Kartik
    Krishnan, Praveen
    Mathew, Minesh
    Jawahar, C. V.
    PROCEEDINGS 2018 16TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR), 2018, : 80 - 85
  • [3] Unconstrained OCR for Urdu using Deep CNN-RNN Hybrid Networks
    Jain, Mohit
    Mathew, Minesh
    Jawahar, C. V.
    PROCEEDINGS 2017 4TH IAPR ASIAN CONFERENCE ON PATTERN RECOGNITION (ACPR), 2017, : 747 - 752
  • [4] Detection of Deepfake Media Using a Hybrid CNN-RNN Model and Particle Swarm Optimization (PSO) Algorithm
    Al-Adwan, Aryaf
    Alazzam, Hadeel
    Al-Anbaki, Noor
    Alduweib, Eman
    COMPUTERS, 2024, 13 (04)
  • [5] Detection of fake news using deep learning CNN-RNN based methods
    Sastrawan, I. Kadek
    Bayupati, I. P. A.
    Arsa, Dewa Made Sri
    ICT EXPRESS, 2022, 8 (03): : 396 - 408
  • [6] Predicting Beijing Air Quality Using Bayesian Optimized CNN-RNN Hybrid Model
    Tu, Zihan
    Wu, Zhe
    2022 ASIA CONFERENCE ON ALGORITHMS, COMPUTING AND MACHINE LEARNING (CACML 2022), 2022, : 581 - 587
  • [7] Surgical Tool Segmentation Using A Hybrid Deep CNN-RNN Auto Encoder-Decoder
    Attia, Mohamed
    Hossny, Mohammed
    Nahavandi, Saeid
    Asadi, Hamed
    2017 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2017, : 3373 - 3378
  • [8] COVID-19 Detection using Hybrid CNN-RNN Architecture with Transfer Learning from X-Rays
    Deshwal D.
    Sangwan P.
    Dahiya N.
    Lilhore U.K.
    Dalal S.
    Simaiya S.
    Current Medical Imaging, 2024, 20
  • [9] Voice Pathology Detection Using a Two-Level Classifier Based on Combined CNN-RNN Architecture
    Ksibi, Amel
    Hakami, Nada Ali
    Alturki, Nazik
    Asiri, Mashael M. M.
    Zakariah, Mohammed
    Ayadi, Manel
    SUSTAINABILITY, 2023, 15 (04)
  • [10] Improving Hate Speech Detection: The Impact of Semantic Representations and Preprocessing Techniques
    Bolucu, Necva
    Ozerdem, Aysegul
    2023 31ST SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU, 2023,