Cyberbullying Detection: Hybrid Models Based on Machine Learning and Natural Language Processing Techniques

被引:20
|
作者
Raj, Chahat [1 ]
Agarwal, Ayush [2 ]
Bharathy, Gnana [1 ]
Narayan, Bhuva [3 ]
Prasad, Mukesh [1 ]
机构
[1] Univ Technol Sydney, Sch Comp Sci, FEIT, Sydney, NSW 2007, Australia
[2] Delhi Technol Univ, Dept Informat Technol, Delhi 110042, India
[3] Univ Technol Sydney, Sch Commun, FASS, Sydney, NSW 2007, Australia
关键词
cyberbullying; hate speech; offensive language; machine learning; neural networks; deep learning; natural language processing;
D O I
10.3390/electronics10222810
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The rise in web and social media interactions has resulted in the efortless proliferation of offensive language and hate speech. Such online harassment, insults, and attacks are commonly termed cyberbullying. The sheer volume of user-generated content has made it challenging to identify such illicit content. Machine learning has wide applications in text classification, and researchers are shifting towards using deep neural networks in detecting cyberbullying due to the several advantages they have over traditional machine learning algorithms. This paper proposes a novel neural network framework with parameter optimization and an algorithmic comparative study of eleven classification methods: four traditional machine learning and seven shallow neural networks on two real world cyberbullying datasets. In addition, this paper also examines the effect of feature extraction and word-embedding-techniques-based natural language processing on algorithmic performance. Key observations from this study show that bidirectional neural networks and attention models provide high classification results. Logistic Regression was observed to be the best among the traditional machine learning classifiers used. Term Frequency-Inverse Document Frequency (TF-IDF) demonstrates consistently high accuracies with traditional machine learning techniques. Global Vectors (GloVe) perform better with neural network models. Bi-GRU and Bi-LSTM worked best amongst the neural networks used. The extensive experiments performed on the two datasets establish the importance of this work by comparing eleven classification methods and seven feature extraction techniques. Our proposed shallow neural networks outperform existing state-of-the-art approaches for cyberbullying detection, with accuracy and F1-scores as high as ~95% and ~98%, respectively.
引用
收藏
页数:20
相关论文
共 50 条
  • [1] Presumptive Detection of Cyberbullying on Twitter through Natural Language Processing and Machine Learning in the Spanish Language
    Leon-Paredes, Gabriel A.
    Palomeque-Leon, Wilson F.
    Gallegos-Segovia, Pablo L.
    Vintimilla-Tapia, Paul E.
    Bravo-Torres, Jack F.
    Barbosa-Santillan, Liliana, I
    Paredes-Pinos, Maria M.
    [J]. 2019 IEEE CHILEAN CONFERENCE ON ELECTRICAL, ELECTRONICS ENGINEERING, INFORMATION AND COMMUNICATION TECHNOLOGIES (CHILECON), 2019,
  • [2] Detection of Cyberbullying Patterns in Low Resource Colloquial Roman Urdu Microtext using Natural Language Processing, Machine Learning, and Ensemble Techniques
    Dewani, Amirita
    Memon, Mohsin Ali
    Bhatti, Sania
    Sulaiman, Adel
    Hamdi, Mohammed
    Alshahrani, Hani
    Alghamdi, Abdullah
    Shaikh, Asadullah
    [J]. APPLIED SCIENCES-BASEL, 2023, 13 (04):
  • [3] A Review of Machine Learning Techniques in Cyberbullying Detection
    Sultan, Daniyar
    Omarov, Batyrkhan
    Kozhamkulova, Zhazira
    Kazbekova, Gulnur
    Alimzhanova, Laura
    Dautbayeva, Aigul
    Zholdassov, Yernar
    Abdrakhmanov, Rustam
    [J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 74 (03): : 5625 - 5640
  • [4] Leveraging Reddit for Suicidal Ideation Detection: A Review of Machine Learning and Natural Language Processing Techniques
    Yeskuatov, Eldar
    Chua, Sook-Ling
    Foo, Lee Kien
    [J]. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH, 2022, 19 (16)
  • [5] Cyberbullying Detection for Urdu Language Using Machine Learning
    Mustafa, Hamza
    Zafar, Kashif
    [J]. FORTHCOMING NETWORKS AND SUSTAINABILITY IN THE AIOT ERA, VOL 1, FONES-AIOT 2024, 2024, 1035 : 244 - 257
  • [6] Machine Learning Techniques for Biomedical Natural Language Processing: A Comprehensive Review
    Houssein, Essam H.
    Mohamed, Rehab E.
    Ali, Abdelmgeid A.
    [J]. IEEE ACCESS, 2021, 9 : 140628 - 140653
  • [7] Deep Learning-based Natural Language Processing Methods Comparison for Presumptive Detection of Cyberbullying in Social Networks
    Andrade-Segarra, Diego A.
    Leon-Paredes, Gabriel A.
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (05) : 796 - 803
  • [8] Ensemble Techniques for Robust Fake News Detection: Integrating Transformers, Natural Language Processing, and Machine Learning
    Al-alshaqi, Mohammed
    Rawat, Danda B.
    Liu, Chunmei
    [J]. SENSORS, 2024, 24 (18)
  • [9] Enhancing ASD detection accuracy: a combined approach of machine learning and deep learning models with natural language processing
    Rubio-Martin, Sergio
    Garcia-Ordas, Maria Teresa
    Bayon-Gutierrez, Martin
    Prieto-Fernandez, Natalia
    Benitez-Andrades, Jose Alberto
    [J]. HEALTH INFORMATION SCIENCE AND SYSTEMS, 2024, 12 (01)
  • [10] An introduction to Deep Learning in Natural Language Processing: Models, techniques, and tools
    Lauriola, Ivano
    Lavelli, Alberto
    Aiolli, Fabio
    [J]. NEUROCOMPUTING, 2022, 470 : 443 - 456