Cyberbullying Text Identification: A Deep Learning and Transformer-based Language Modeling Approach

被引:0
|
作者
Saifullah K. [1 ]
Khan M.I. [1 ]
Jamal S. [2 ]
Sarker I.H. [3 ]
机构
[1] Department of Computer Science and Engineering, Chittagong University of Engineering and Technology, Chittagong
[2] Dept. of Information Technology, Georgia Southern University, Statesboro, GA
[3] Centre for Securing Digital Futures, School of Science, Edith Cowan University, Perth, 6027, WA
关键词
Cyberbullying; deep learning; fine tuning; harmful messages; large language modeling; natural language processing (NLP); OOV; transformers models;
D O I
10.4108/EETINIS.V11I1.4703
中图分类号
学科分类号
摘要
In the contemporary digital age, social media platforms like Facebook, Twitter, and YouTube serve as vital channels for individuals to express ideas and connect with others. Despite fostering increased connectivity, these platforms have inadvertently given rise to negative behaviors, particularly cyberbullying. While extensive research has been conducted on high-resource languages such as English, there is a notable scarcity of resources for low-resource languages like Bengali, Arabic, Tamil, etc., particularly in terms of language modeling. This study addresses this gap by developing a cyberbullying text identification system called BullyFilterNeT tailored for social media texts, considering Bengali as a test case. The intelligent BullyFilterNeT system devised overcomes Out-of-Vocabulary (OOV) challenges associated with non-contextual embeddings and addresses the limitations of context-aware feature representations. To facilitate a comprehensive understanding, three non-contextual embedding models GloVe, FastText, and Word2Vec are developed for feature extraction in Bengali. These embedding models are utilized in the classification models, employing three statistical models (SVM, SGD, Libsvm), and four deep learning models (CNN, VDCNN, LSTM, GRU). Additionally, the study employs six transformer-based language models: mBERT, bELECTRA, IndicBERT, XML-RoBERTa, DistilBERT, and BanglaBERT, respectively to overcome the limitations of earlier models. Remarkably, BanglaBERT-based BullyFilterNeT achieves the highest accuracy of 88.04% in our test set, underscoring its effectiveness in cyberbullying text identification in the Bengali language. Copyright © 2024 K. Saifullah et al., licensed to EAI. This is an open access article distributed under the terms of the CC BY-NC-SA 4.0, which permits copying, redistributing, remixing, transformation, and building upon the material in any medium so long as the original work is properly cited.
引用
下载
收藏
页码:1 / 12
页数:11
相关论文
共 50 条
  • [41] Enriching Transformer-Based Embeddings for Emotion Identification in an Agglutinative Language: Turkish
    Uymaz, Hande Aka
    Metin, Senem Kumova
    IT PROFESSIONAL, 2023, 25 (04) : 67 - 73
  • [42] Causal and Masked Language Modeling of Java']Javanese Language using Transformer-based Architectures
    Wongso, Wilson
    Setiawan, David Samuel
    Suhartono, Derwin
    13TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER SCIENCE AND INFORMATION SYSTEMS (ICACSIS 2021), 2021, : 29 - 35
  • [43] Text Detection of Transformer Based on Deep Learning Algorithm
    Cheng, Yu
    Wan, Yiru
    Sima, Yingjie
    Zhang, Yinmei
    Hu, Sanying
    Wu, Shu
    TEHNICKI VJESNIK-TECHNICAL GAZETTE, 2022, 29 (03): : 861 - 866
  • [44] Transcribing paralinguistic acoustic cues to target language text in transformer-based speech-to-text translation
    Tokuyama, Hirotaka
    Sakti, Sakriani
    Sudoh, Katsuhito
    Nakamura, Satoshi
    Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2021, 5 : 3976 - 3980
  • [45] Transcribing Paralinguistic Acoustic Cues to Target Language Text in Transformer-based Speech-to-Text Translation
    Tokuyama, Hirotaka
    Sakti, Sakriani
    Sudoh, Katsuhito
    Nakamura, Satoshi
    INTERSPEECH 2021, 2021, : 2262 - 2266
  • [46] TIRec: Transformer-based Invoice Text Recognition
    Chen, Yanlan
    2023 2ND ASIA CONFERENCE ON ALGORITHMS, COMPUTING AND MACHINE LEARNING, CACML 2023, 2023, : 175 - 180
  • [47] Practical Transformer-based Multilingual Text Classification
    Wang, Cindy
    Banko, Michele
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, NAACL-HLT 2021, 2021, : 121 - 129
  • [48] Emotion Classification in a Resource Constrained Language Using Transformer-based Approach
    Das, Avishek
    Sharif, Omar
    Hoque, Mohammed Moshiul
    Sarker, Iqbal H.
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 150 - 158
  • [49] A Transformer-Based Framework for Scene Text Recognition
    Selvam, Prabu
    Koilraj, Joseph Abraham Sundar
    Tavera Romero, Carlos Andres
    Alharbi, Meshal
    Mehbodniya, Abolfazl
    Webber, Julian L.
    Sengan, Sudhakar
    IEEE ACCESS, 2022, 10 : 100895 - 100910
  • [50] A Lightweight Transformer-Based Approach of Specific Emitter Identification for the Automatic Identification System
    Deng, Pengfei
    Hong, Shaohua
    Qi, Jie
    Wang, Lin
    Sun, Haixin
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2023, 18 : 2303 - 2317