Cyberbullying Text Identification: A Deep Learning and Transformer-based Language Modeling Approach

被引:0
|
作者
Saifullah K. [1 ]
Khan M.I. [1 ]
Jamal S. [2 ]
Sarker I.H. [3 ]
机构
[1] Department of Computer Science and Engineering, Chittagong University of Engineering and Technology, Chittagong
[2] Dept. of Information Technology, Georgia Southern University, Statesboro, GA
[3] Centre for Securing Digital Futures, School of Science, Edith Cowan University, Perth, 6027, WA
关键词
Cyberbullying; deep learning; fine tuning; harmful messages; large language modeling; natural language processing (NLP); OOV; transformers models;
D O I
10.4108/EETINIS.V11I1.4703
中图分类号
学科分类号
摘要
In the contemporary digital age, social media platforms like Facebook, Twitter, and YouTube serve as vital channels for individuals to express ideas and connect with others. Despite fostering increased connectivity, these platforms have inadvertently given rise to negative behaviors, particularly cyberbullying. While extensive research has been conducted on high-resource languages such as English, there is a notable scarcity of resources for low-resource languages like Bengali, Arabic, Tamil, etc., particularly in terms of language modeling. This study addresses this gap by developing a cyberbullying text identification system called BullyFilterNeT tailored for social media texts, considering Bengali as a test case. The intelligent BullyFilterNeT system devised overcomes Out-of-Vocabulary (OOV) challenges associated with non-contextual embeddings and addresses the limitations of context-aware feature representations. To facilitate a comprehensive understanding, three non-contextual embedding models GloVe, FastText, and Word2Vec are developed for feature extraction in Bengali. These embedding models are utilized in the classification models, employing three statistical models (SVM, SGD, Libsvm), and four deep learning models (CNN, VDCNN, LSTM, GRU). Additionally, the study employs six transformer-based language models: mBERT, bELECTRA, IndicBERT, XML-RoBERTa, DistilBERT, and BanglaBERT, respectively to overcome the limitations of earlier models. Remarkably, BanglaBERT-based BullyFilterNeT achieves the highest accuracy of 88.04% in our test set, underscoring its effectiveness in cyberbullying text identification in the Bengali language. Copyright © 2024 K. Saifullah et al., licensed to EAI. This is an open access article distributed under the terms of the CC BY-NC-SA 4.0, which permits copying, redistributing, remixing, transformation, and building upon the material in any medium so long as the original work is properly cited.
引用
下载
收藏
页码:1 / 12
页数:11
相关论文
共 50 条
  • [31] Transformer-based deep learning model and video dataset for unsafe action identification in construction projects
    Yang, Meng
    Wu, Chengke
    Guo, Yuanjun
    Jiang, Rui
    Zhou, Feixiang
    Zhang, Jianlin
    Yang, Zhile
    AUTOMATION IN CONSTRUCTION, 2023, 146
  • [32] A transformer-based deep learning framework to predict employee attrition
    Li, Wenhui
    PEERJ COMPUTER SCIENCE, 2023, 9
  • [33] Transformer-based deep learning model for forced oscillation localization
    Matar, Mustafa
    Estevez, Pablo Gill
    Marchi, Pablo
    Messina, Francisco
    Elmoudi, Ramadan
    Wshah, Safwan
    INTERNATIONAL JOURNAL OF ELECTRICAL POWER & ENERGY SYSTEMS, 2023, 146
  • [34] Characterization of groundwater contamination: A transformer-based deep learning model
    Bai, Tao
    Tahmasebi, Pejman
    ADVANCES IN WATER RESOURCES, 2022, 164
  • [35] GIT: A Transformer-Based Deep Learning Model for Geoacoustic Inversion
    Feng, Sheng
    Zhu, Xiaoqian
    Ma, Shuqing
    Lan, Qiang
    JOURNAL OF MARINE SCIENCE AND ENGINEERING, 2023, 11 (06)
  • [36] PETR: Rethinking the Capability of Transformer-Based Language Model in Scene Text Recognition
    Wang, Yuxin
    Xie, Hongtao
    Fang, Shancheng
    Xing, Mengting
    Wang, Jing
    Zhu, Shenggao
    Zhang, Yongdong
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 5585 - 5598
  • [37] Transformer-Based Deep Learning Method for the Prediction of Ventilator Pressure
    Fan, Ruizhe
    2022 IEEE 2ND INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND SOFTWARE ENGINEERING (ICICSE 2022), 2022, : 25 - 28
  • [38] stEnTrans: Transformer-Based Deep Learning for Spatial Transcriptomics Enhancement
    Xue, Shuailin
    Zhu, Fangfang
    Wang, Changmiao
    Min, Wenwen
    BIOINFORMATICS RESEARCH AND APPLICATIONS, PT I, ISBRA 2024, 2024, 14954 : 63 - 75
  • [39] Influence of Language Proficiency on the Readability of Review Text and Transformer-based Models for Determining Language Proficiency
    Sazzed, Salim
    COMPANION PROCEEDINGS OF THE WEB CONFERENCE 2022, WWW 2022 COMPANION, 2022, : 881 - 886
  • [40] Deep Learning-Based Cyberbullying Detection in Kurdish Language
    Badawi, Soran
    COMPUTER JOURNAL, 2024, 67 (07): : 2548 - 2558