Cyberbullying Text Identification: A Deep Learning and Transformer-based Language Modeling Approach

被引:0
|
作者
Saifullah K. [1 ]
Khan M.I. [1 ]
Jamal S. [2 ]
Sarker I.H. [3 ]
机构
[1] Department of Computer Science and Engineering, Chittagong University of Engineering and Technology, Chittagong
[2] Dept. of Information Technology, Georgia Southern University, Statesboro, GA
[3] Centre for Securing Digital Futures, School of Science, Edith Cowan University, Perth, 6027, WA
关键词
Cyberbullying; deep learning; fine tuning; harmful messages; large language modeling; natural language processing (NLP); OOV; transformers models;
D O I
10.4108/EETINIS.V11I1.4703
中图分类号
学科分类号
摘要
In the contemporary digital age, social media platforms like Facebook, Twitter, and YouTube serve as vital channels for individuals to express ideas and connect with others. Despite fostering increased connectivity, these platforms have inadvertently given rise to negative behaviors, particularly cyberbullying. While extensive research has been conducted on high-resource languages such as English, there is a notable scarcity of resources for low-resource languages like Bengali, Arabic, Tamil, etc., particularly in terms of language modeling. This study addresses this gap by developing a cyberbullying text identification system called BullyFilterNeT tailored for social media texts, considering Bengali as a test case. The intelligent BullyFilterNeT system devised overcomes Out-of-Vocabulary (OOV) challenges associated with non-contextual embeddings and addresses the limitations of context-aware feature representations. To facilitate a comprehensive understanding, three non-contextual embedding models GloVe, FastText, and Word2Vec are developed for feature extraction in Bengali. These embedding models are utilized in the classification models, employing three statistical models (SVM, SGD, Libsvm), and four deep learning models (CNN, VDCNN, LSTM, GRU). Additionally, the study employs six transformer-based language models: mBERT, bELECTRA, IndicBERT, XML-RoBERTa, DistilBERT, and BanglaBERT, respectively to overcome the limitations of earlier models. Remarkably, BanglaBERT-based BullyFilterNeT achieves the highest accuracy of 88.04% in our test set, underscoring its effectiveness in cyberbullying text identification in the Bengali language. Copyright © 2024 K. Saifullah et al., licensed to EAI. This is an open access article distributed under the terms of the CC BY-NC-SA 4.0, which permits copying, redistributing, remixing, transformation, and building upon the material in any medium so long as the original work is properly cited.
引用
下载
收藏
页码:1 / 12
页数:11
相关论文
共 50 条
  • [21] Filling GRACE data gap using an innovative transformer-based deep learning approach
    Wang, Longhao
    Zhang, Yongqiang
    Remote Sensing of Environment, 2024, 315
  • [22] Transformer-Based Joint Learning Approach for Text Normalization in Vietnamese Automatic Speech Recognition Systems
    Viet The Bui
    Tho Chi Luong
    Oanh Thi Tran
    CYBERNETICS AND SYSTEMS, 2024, 55 (07) : 1614 - 1630
  • [23] Reward modeling for mitigating toxicity in transformer-based language models
    Farshid Faal
    Ketra Schmitt
    Jia Yuan Yu
    Applied Intelligence, 2023, 53 : 8421 - 8435
  • [24] Transformer-based Deep Learning Approach Predicts Glaucoma Surgical Intervention from OCT
    Christopher, Mark
    Gonzalez, Ruben
    Huynh, Justin
    Walker, Evan
    Saseendrakumar, Bharanidharan Radha
    Bowd, Christopher
    Belghith, Akram
    Goldbaum, Michael Henry
    Fazio, Massimo Antonio
    Girkin, Christopher A.
    De Moraes, C. Gustavo
    Liebmann, Jeffrey M.
    Weinreb, Robert N.
    Baxter, Sally
    Zangwill, Linda M.
    INVESTIGATIVE OPHTHALMOLOGY & VISUAL SCIENCE, 2023, 64 (08)
  • [25] Reward modeling for mitigating toxicity in transformer-based language models
    Faal, Farshid
    Schmitt, Ketra
    Yu, Jia Yuan
    APPLIED INTELLIGENCE, 2023, 53 (07) : 8421 - 8435
  • [26] Transformer-based Text Detection in the Wild
    Raisi, Zobeir
    Naiel, Mohamed A.
    Younes, Georges
    Wardell, Steven
    Zelek, John S.
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 3156 - 3165
  • [27] A Transformer-based Approach for Translating Natural Language to Bash Commands
    Fu, Quchen
    Teng, Zhongwei
    White, Jules
    Schmidt, Douglas C.
    20TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2021), 2021, : 1245 - 1248
  • [28] Oil species identification based on fluorescence excitation-emission matrix and transformer-based deep learning
    Xie, Ming
    Xie, Lei
    Li, Ying
    Han, Bing
    SPECTROCHIMICA ACTA PART A-MOLECULAR AND BIOMOLECULAR SPECTROSCOPY, 2023, 302
  • [29] Smart Home Notifications in Croatian Language: A Transformer-Based Approach
    Simunec, Magdalena
    Soic, Renato
    2023 17TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS, CONTEL, 2023,
  • [30] Cyberbullying Identification System Based Deep Learning Algorithms
    Aldhyani, Theyazn H. H.
    Al-Adhaileh, Mosleh Hmoud
    Alsubari, Saleh Nagi
    ELECTRONICS, 2022, 11 (20)