Imbalanced Toxic Comments Classification using Data Augmentation and Deep Learning

被引:35
|
作者
Ibrahim, Mai [1 ]
Torki, Marwan [1 ]
El-Makky, Nagwa [1 ]
机构
[1] Alexandria Univ, Comp & Syst Engn Dept, Alexandria, Egypt
关键词
natural language processing; sentence classification; multi-label classification; deep learning; convolutional neural network; long short-term memory; gated recurrent units;
D O I
10.1109/ICMLA.2018.00141
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently cyber-bullying and online harassment have become two of the most serious issues in many public online communities. In this paper, we use data from Wikipedia talk page edits to train multi-label classifier that detects different types of toxicity in online user-generated content. We present different data augmentation techniques to overcome the data imbalance problem in the Wikipedia dataset. The proposed solution is an ensemble of three models: convolutional neural network (CNN), bidirectional long short-term memory (LSTM) and bidirectional gated recurrent units (GRU). We divide the classification problem into two steps, first we determine whether or not the input is toxic then we find the types of toxicity present in the toxic content. The evaluation results show that the proposed ensemble approach provides the highest accuracy among all considered algorithms. It achieves 0.828 F-1-score for toxic/non-toxic classification and 0.872 for toxicity types prediction.
引用
收藏
页码:875 / 878
页数:4
相关论文
共 50 条
  • [1] Classification of Imbalanced Data Using Deep Learning with Adding Noise
    Fan, Wan-Wei
    Lee, Ching-Hung
    [J]. JOURNAL OF SENSORS, 2021, 2021
  • [2] Deep Learning for Imbalanced Multimedia Data Classification
    Yan, Yilin
    Chen, Min
    Shyu, Mei-Ling
    Chen, Shu-Ching
    [J]. 2015 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM), 2015, : 483 - 488
  • [3] Brain tumors classification with deep learning using data augmentation
    Gurkahraman, Kali
    Karakis, Rukiye
    [J]. JOURNAL OF THE FACULTY OF ENGINEERING AND ARCHITECTURE OF GAZI UNIVERSITY, 2021, 36 (02): : 997 - 1011
  • [4] Skin Lesion Classification on Imbalanced Data Using Deep Learning with Soft Attention
    Viet Dung Nguyen
    Ngoc Dung Bui
    Hoang Khoi Do
    [J]. SENSORS, 2022, 22 (19)
  • [5] Fingerprint pattern classification using deep transfer learning and data augmentation
    Ametefe, Divine Senanu
    Sarnin, Suzi Seroja
    Ali, Darmawaty Mohd
    Muhammad, Zaigham Zaheer
    [J]. VISUAL COMPUTER, 2023, 39 (04): : 1703 - 1716
  • [6] Fingerprint pattern classification using deep transfer learning and data augmentation
    Divine Senanu Ametefe
    Suzi Seroja Sarnin
    Darmawaty Mohd Ali
    Zaigham Zaheer Muhammad
    [J]. The Visual Computer, 2023, 39 : 1703 - 1716
  • [7] Data Augmentation Classifier for Imbalanced Fault Classification
    Jiang, Xiaoyu
    Ge, Zhiqiang
    [J]. IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2021, 18 (03) : 1206 - 1217
  • [8] Synthetic data augmentation for surface defect detection and classification using deep learning
    Jain, Saksham
    Seth, Gautam
    Paruthi, Arpit
    Soni, Umang
    Kumar, Girish
    [J]. JOURNAL OF INTELLIGENT MANUFACTURING, 2022, 33 (04) : 1007 - 1020
  • [9] Synthetic data augmentation for surface defect detection and classification using deep learning
    Saksham Jain
    Gautam Seth
    Arpit Paruthi
    Umang Soni
    Girish Kumar
    [J]. Journal of Intelligent Manufacturing, 2022, 33 : 1007 - 1020
  • [10] Deep Learning Model for Pathogen Classification Using Feature Fusion and Data Augmentation
    Ahmad, Fareed
    Farooq, Amjad
    Khan, Muhammad Usman Ghani
    [J]. CURRENT BIOINFORMATICS, 2021, 16 (03) : 466 - 483