Cyberbullying Detection: Hybrid Models Based on Machine Learning and Natural Language Processing Techniques

被引:20
|
作者
Raj, Chahat [1 ]
Agarwal, Ayush [2 ]
Bharathy, Gnana [1 ]
Narayan, Bhuva [3 ]
Prasad, Mukesh [1 ]
机构
[1] Univ Technol Sydney, Sch Comp Sci, FEIT, Sydney, NSW 2007, Australia
[2] Delhi Technol Univ, Dept Informat Technol, Delhi 110042, India
[3] Univ Technol Sydney, Sch Commun, FASS, Sydney, NSW 2007, Australia
关键词
cyberbullying; hate speech; offensive language; machine learning; neural networks; deep learning; natural language processing;
D O I
10.3390/electronics10222810
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The rise in web and social media interactions has resulted in the efortless proliferation of offensive language and hate speech. Such online harassment, insults, and attacks are commonly termed cyberbullying. The sheer volume of user-generated content has made it challenging to identify such illicit content. Machine learning has wide applications in text classification, and researchers are shifting towards using deep neural networks in detecting cyberbullying due to the several advantages they have over traditional machine learning algorithms. This paper proposes a novel neural network framework with parameter optimization and an algorithmic comparative study of eleven classification methods: four traditional machine learning and seven shallow neural networks on two real world cyberbullying datasets. In addition, this paper also examines the effect of feature extraction and word-embedding-techniques-based natural language processing on algorithmic performance. Key observations from this study show that bidirectional neural networks and attention models provide high classification results. Logistic Regression was observed to be the best among the traditional machine learning classifiers used. Term Frequency-Inverse Document Frequency (TF-IDF) demonstrates consistently high accuracies with traditional machine learning techniques. Global Vectors (GloVe) perform better with neural network models. Bi-GRU and Bi-LSTM worked best amongst the neural networks used. The extensive experiments performed on the two datasets establish the importance of this work by comparing eleven classification methods and seven feature extraction techniques. Our proposed shallow neural networks outperform existing state-of-the-art approaches for cyberbullying detection, with accuracy and F1-scores as high as ~95% and ~98%, respectively.
引用
收藏
页数:20
相关论文
共 50 条
  • [21] Knowledgeable Machine Learning for Natural Language Processing
    Han, Xu
    Zhang, Zhengyan
    Liu, Zhiyuan
    [J]. COMMUNICATIONS OF THE ACM, 2021, 64 (11) : 50 - 51
  • [22] Machine learning in statistical natural language processing
    Mochihashi, Daichi
    [J]. Kyokai Joho Imeji Zasshi/Journal of the Institute of Image Information and Television Engineers, 2015, 69 (02): : 131 - 135
  • [23] Deep Learning Techniques for Natural Language Processing
    Rodzin, Sergey
    Bova, Victoria
    Kravchenko, Yury
    Rodzina, Lada
    [J]. ARTIFICIAL INTELLIGENCE TRENDS IN SYSTEMS, VOL 2, 2022, 502 : 121 - 130
  • [24] Dementia Detection using Transformer-Based Deep Learning and Natural Language Processing Models
    Saltz, Ploypaphat
    Lin, Shih Yin
    Cheng, Sunny Chieh
    Si, Dong
    [J]. 2021 IEEE 9TH INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI 2021), 2021, : 509 - 510
  • [25] Leveraging Natural Language Processing and Machine Learning for Efficient Fake News Detection
    Kumar, Naresh
    Malhotra, Meetu
    Aggarwal, Bharti
    Rai, Dinesh
    Aggarwal, Gaurav
    [J]. Proceedings - International Conference on Technological Advancements in Computational Sciences, ICTACS 2023, 2023, : 535 - 541
  • [26] Network Intrusion Detection using Natural Language Processing and Ensemble Machine Learning
    Das, Saikat
    Ashrafuzzamant, Mohammad
    Sheldon, Frederick T.
    Shiva, Sajjan
    [J]. 2020 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2020, : 829 - 835
  • [27] Detection of Fake News Using Machine Learning and Natural Language Processing Algorithms
    Prachi, Noshin Nirvana
    Habibullah, Md.
    Rafi, Md. Emanul Haque
    Alam, Evan
    Khan, Riasat
    [J]. JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, 2022, 13 (06) : 652 - 661
  • [28] A Comparison of Natural Language Processing and Machine Learning Methods for Phishing Email Detection
    Bountakas, Panagiotis
    Koutroumpouchos, Konstantinos
    Xenakis, Christos
    [J]. ARES 2021: 16TH INTERNATIONAL CONFERENCE ON AVAILABILITY, RELIABILITY AND SECURITY, 2021,
  • [29] RESEARCH ON THE TEXT CLASSIFICATION BASED ON NATURAL LANGUAGE PROCESSING AND MACHINE LEARNING
    Chen Keming
    Zheng Jianguo
    [J]. JOURNAL OF THE BALKAN TRIBOLOGICAL ASSOCIATION, 2016, 22 (03): : 2484 - 2494
  • [30] Arabic natural language processing and machine learning-based systems
    Larabi Marie-Sainte, Souad
    Alalyani, Nada
    Alotaibi, Sihaam
    Ghouzali, Sanaa
    Abunadi, Ibrahim
    [J]. IEEE Access, 2019, 7 : 7011 - 7020