Cyberbullying Text Identification: A Deep Learning and Transformer-based Language Modeling Approach

被引：0

作者：

Saifullah K. ^{[1
]}

Khan M.I. ^{[1
]}

Jamal S. ^{[2
]}

Sarker I.H. ^{[3
]}

机构：

[1] Department of Computer Science and Engineering, Chittagong University of Engineering and Technology, Chittagong

[2] Dept. of Information Technology, Georgia Southern University, Statesboro, GA

[3] Centre for Securing Digital Futures, School of Science, Edith Cowan University, Perth, 6027, WA

来源：

EAI Endorsed Transactions on Industrial Networks and Intelligent Systems | 2024年 / 11卷 / 01期

关键词：

Cyberbullying; deep learning; fine tuning; harmful messages; large language modeling; natural language processing (NLP); OOV; transformers models;

D O I：

10.4108/EETINIS.V11I1.4703

中图分类号：

学科分类号：

摘要：

In the contemporary digital age, social media platforms like Facebook, Twitter, and YouTube serve as vital channels for individuals to express ideas and connect with others. Despite fostering increased connectivity, these platforms have inadvertently given rise to negative behaviors, particularly cyberbullying. While extensive research has been conducted on high-resource languages such as English, there is a notable scarcity of resources for low-resource languages like Bengali, Arabic, Tamil, etc., particularly in terms of language modeling. This study addresses this gap by developing a cyberbullying text identification system called BullyFilterNeT tailored for social media texts, considering Bengali as a test case. The intelligent BullyFilterNeT system devised overcomes Out-of-Vocabulary (OOV) challenges associated with non-contextual embeddings and addresses the limitations of context-aware feature representations. To facilitate a comprehensive understanding, three non-contextual embedding models GloVe, FastText, and Word2Vec are developed for feature extraction in Bengali. These embedding models are utilized in the classification models, employing three statistical models (SVM, SGD, Libsvm), and four deep learning models (CNN, VDCNN, LSTM, GRU). Additionally, the study employs six transformer-based language models: mBERT, bELECTRA, IndicBERT, XML-RoBERTa, DistilBERT, and BanglaBERT, respectively to overcome the limitations of earlier models. Remarkably, BanglaBERT-based BullyFilterNeT achieves the highest accuracy of 88.04% in our test set, underscoring its effectiveness in cyberbullying text identification in the Bengali language. Copyright © 2024 K. Saifullah et al., licensed to EAI. This is an open access article distributed under the terms of the CC BY-NC-SA 4.0, which permits copying, redistributing, remixing, transformation, and building upon the material in any medium so long as the original work is properly cited.

引用

下载

页码：1 / 12

页数：11

共 50 条

[31] Transformer-based deep learning model and video dataset for unsafe action identification in construction projects
Yang, Meng
Wu, Chengke
Guo, Yuanjun
Jiang, Rui
Zhou, Feixiang
Zhang, Jianlin
Yang, Zhile
AUTOMATION IN CONSTRUCTION, 2023, 146
[32] A transformer-based deep learning framework to predict employee attrition
Li, Wenhui
PEERJ COMPUTER SCIENCE, 2023, 9
[33] Transformer-based deep learning model for forced oscillation localization
Matar, Mustafa
Estevez, Pablo Gill
Marchi, Pablo
Messina, Francisco
Elmoudi, Ramadan
Wshah, Safwan
INTERNATIONAL JOURNAL OF ELECTRICAL POWER & ENERGY SYSTEMS, 2023, 146
[34] Characterization of groundwater contamination: A transformer-based deep learning model
Bai, Tao
Tahmasebi, Pejman
ADVANCES IN WATER RESOURCES, 2022, 164
[35] GIT: A Transformer-Based Deep Learning Model for Geoacoustic Inversion
Feng, Sheng
Zhu, Xiaoqian
Ma, Shuqing
Lan, Qiang
JOURNAL OF MARINE SCIENCE AND ENGINEERING, 2023, 11 (06)
[36] PETR: Rethinking the Capability of Transformer-Based Language Model in Scene Text Recognition
Wang, Yuxin
Xie, Hongtao
Fang, Shancheng
Xing, Mengting
Wang, Jing
Zhu, Shenggao
Zhang, Yongdong
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 5585 - 5598
[37] Transformer-Based Deep Learning Method for the Prediction of Ventilator Pressure
Fan, Ruizhe
2022 IEEE 2ND INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND SOFTWARE ENGINEERING (ICICSE 2022), 2022, : 25 - 28
[38] stEnTrans: Transformer-Based Deep Learning for Spatial Transcriptomics Enhancement
Xue, Shuailin
Zhu, Fangfang
Wang, Changmiao
Min, Wenwen
BIOINFORMATICS RESEARCH AND APPLICATIONS, PT I, ISBRA 2024, 2024, 14954 : 63 - 75
[39] Influence of Language Proficiency on the Readability of Review Text and Transformer-based Models for Determining Language Proficiency
Sazzed, Salim
COMPANION PROCEEDINGS OF THE WEB CONFERENCE 2022, WWW 2022 COMPANION, 2022, : 881 - 886
[40] Deep Learning-Based Cyberbullying Detection in Kurdish Language
Badawi, Soran
COMPUTER JOURNAL, 2024, 67 (07): : 2548 - 2558

← 1 2 3 4 5 →