Cyberbullying Text Identification: A Deep Learning and Transformer-based Language Modeling Approach

被引:0
|
作者
Saifullah K. [1 ]
Khan M.I. [1 ]
Jamal S. [2 ]
Sarker I.H. [3 ]
机构
[1] Department of Computer Science and Engineering, Chittagong University of Engineering and Technology, Chittagong
[2] Dept. of Information Technology, Georgia Southern University, Statesboro, GA
[3] Centre for Securing Digital Futures, School of Science, Edith Cowan University, Perth, 6027, WA
关键词
Cyberbullying; deep learning; fine tuning; harmful messages; large language modeling; natural language processing (NLP); OOV; transformers models;
D O I
10.4108/EETINIS.V11I1.4703
中图分类号
学科分类号
摘要
In the contemporary digital age, social media platforms like Facebook, Twitter, and YouTube serve as vital channels for individuals to express ideas and connect with others. Despite fostering increased connectivity, these platforms have inadvertently given rise to negative behaviors, particularly cyberbullying. While extensive research has been conducted on high-resource languages such as English, there is a notable scarcity of resources for low-resource languages like Bengali, Arabic, Tamil, etc., particularly in terms of language modeling. This study addresses this gap by developing a cyberbullying text identification system called BullyFilterNeT tailored for social media texts, considering Bengali as a test case. The intelligent BullyFilterNeT system devised overcomes Out-of-Vocabulary (OOV) challenges associated with non-contextual embeddings and addresses the limitations of context-aware feature representations. To facilitate a comprehensive understanding, three non-contextual embedding models GloVe, FastText, and Word2Vec are developed for feature extraction in Bengali. These embedding models are utilized in the classification models, employing three statistical models (SVM, SGD, Libsvm), and four deep learning models (CNN, VDCNN, LSTM, GRU). Additionally, the study employs six transformer-based language models: mBERT, bELECTRA, IndicBERT, XML-RoBERTa, DistilBERT, and BanglaBERT, respectively to overcome the limitations of earlier models. Remarkably, BanglaBERT-based BullyFilterNeT achieves the highest accuracy of 88.04% in our test set, underscoring its effectiveness in cyberbullying text identification in the Bengali language. Copyright © 2024 K. Saifullah et al., licensed to EAI. This is an open access article distributed under the terms of the CC BY-NC-SA 4.0, which permits copying, redistributing, remixing, transformation, and building upon the material in any medium so long as the original work is properly cited.
引用
下载
收藏
页码:1 / 12
页数:11
相关论文
共 50 条
  • [1] Identification of cyberbullying: A deep learning based multimodal approach
    Paul, Sayanta
    Saha, Sriparna
    Hasanuzzaman, Mohammed
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (19) : 26989 - 27008
  • [2] Identification of cyberbullying: A deep learning based multimodal approach
    Sayanta Paul
    Sriparna Saha
    Mohammed Hasanuzzaman
    Multimedia Tools and Applications, 2022, 81 : 26989 - 27008
  • [3] Automatic identification of suicide notes with a transformer-based deep learning model
    Zhang, Tianlin
    Schoene, Annika M.
    Ananiadou, Sophia
    INTERNET INTERVENTIONS-THE APPLICATION OF INFORMATION TECHNOLOGY IN MENTAL AND BEHAVIOURAL HEALTH, 2021, 25
  • [4] Comparative Analysis of Traditional Machine Learning and Transformer-based Deep Learning Models for Text Classification
    Aydin, Nazif
    Erdem, Osman Ayhan
    Tekerek, Adem
    JOURNAL OF POLYTECHNIC-POLITEKNIK DERGISI, 2024,
  • [5] Bornon: Bengali Image Captioning with Transformer-Based Deep Learning Approach
    Faisal Muhammad Shah
    Mayeesha Humaira
    Md Abidur Rahman Khan Jim
    Amit Saha Ami
    Shimul Paul
    SN Computer Science, 2022, 3 (1)
  • [6] Transformer-based Question Text Generation in the Learning System
    Li, Jiajun
    Song, Huazhu
    Li, Jun
    6TH INTERNATIONAL CONFERENCE ON INNOVATION IN ARTIFICIAL INTELLIGENCE, ICIAI2022, 2022, : 50 - 56
  • [7] A transformer-based approach to Nigerian Pidgin text generation
    Garba, Kabir
    Kolajo, Taiwo
    Agbogun, Joshua B.
    International Journal of Speech Technology, 2024, 27 (04) : 1027 - 1037
  • [8] Transformer-Based Composite Language Models for Text Evaluation and Classification
    Skoric, Mihailo
    Utvic, Milos
    Stankovic, Ranka
    MATHEMATICS, 2023, 11 (22)
  • [9] Automatic text summarization using transformer-based language models
    Rao, Ritika
    Sharma, Sourabh
    Malik, Nitin
    INTERNATIONAL JOURNAL OF SYSTEM ASSURANCE ENGINEERING AND MANAGEMENT, 2024, 15 (06) : 2599 - 2605
  • [10] AraCovTexFinder: Leveraging the transformer-based language model for Arabic COVID-19 text identification
    Hossain, Md. Rajib
    Hoque, Mohammed Moshiul
    Siddique, Nazmul
    Dewan, Ali Akber
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 133