Cyberbullying Text Identification: A Deep Learning and Transformer-based Language Modeling Approach

被引：0

作者：

Saifullah K. ^{[1
]}

Khan M.I. ^{[1
]}

Jamal S. ^{[2
]}

Sarker I.H. ^{[3
]}

机构：

[1] Department of Computer Science and Engineering, Chittagong University of Engineering and Technology, Chittagong

[2] Dept. of Information Technology, Georgia Southern University, Statesboro, GA

[3] Centre for Securing Digital Futures, School of Science, Edith Cowan University, Perth, 6027, WA

来源：

EAI Endorsed Transactions on Industrial Networks and Intelligent Systems | 2024年 / 11卷 / 01期

关键词：

Cyberbullying; deep learning; fine tuning; harmful messages; large language modeling; natural language processing (NLP); OOV; transformers models;

D O I：

10.4108/EETINIS.V11I1.4703

中图分类号：

学科分类号：

摘要：

In the contemporary digital age, social media platforms like Facebook, Twitter, and YouTube serve as vital channels for individuals to express ideas and connect with others. Despite fostering increased connectivity, these platforms have inadvertently given rise to negative behaviors, particularly cyberbullying. While extensive research has been conducted on high-resource languages such as English, there is a notable scarcity of resources for low-resource languages like Bengali, Arabic, Tamil, etc., particularly in terms of language modeling. This study addresses this gap by developing a cyberbullying text identification system called BullyFilterNeT tailored for social media texts, considering Bengali as a test case. The intelligent BullyFilterNeT system devised overcomes Out-of-Vocabulary (OOV) challenges associated with non-contextual embeddings and addresses the limitations of context-aware feature representations. To facilitate a comprehensive understanding, three non-contextual embedding models GloVe, FastText, and Word2Vec are developed for feature extraction in Bengali. These embedding models are utilized in the classification models, employing three statistical models (SVM, SGD, Libsvm), and four deep learning models (CNN, VDCNN, LSTM, GRU). Additionally, the study employs six transformer-based language models: mBERT, bELECTRA, IndicBERT, XML-RoBERTa, DistilBERT, and BanglaBERT, respectively to overcome the limitations of earlier models. Remarkably, BanglaBERT-based BullyFilterNeT achieves the highest accuracy of 88.04% in our test set, underscoring its effectiveness in cyberbullying text identification in the Bengali language. Copyright © 2024 K. Saifullah et al., licensed to EAI. This is an open access article distributed under the terms of the CC BY-NC-SA 4.0, which permits copying, redistributing, remixing, transformation, and building upon the material in any medium so long as the original work is properly cited.

引用

下载

页码：1 / 12

页数：11

共 50 条

[21] Filling GRACE data gap using an innovative transformer-based deep learning approach
Wang, Longhao
Zhang, Yongqiang
Remote Sensing of Environment, 2024, 315
[22] Transformer-Based Joint Learning Approach for Text Normalization in Vietnamese Automatic Speech Recognition Systems
Viet The Bui
Tho Chi Luong
Oanh Thi Tran
CYBERNETICS AND SYSTEMS, 2024, 55 (07) : 1614 - 1630
[23] Reward modeling for mitigating toxicity in transformer-based language models
Farshid Faal
Ketra Schmitt
Jia Yuan Yu
Applied Intelligence, 2023, 53 : 8421 - 8435
[24] Transformer-based Deep Learning Approach Predicts Glaucoma Surgical Intervention from OCT
Christopher, Mark
Gonzalez, Ruben
Huynh, Justin
Walker, Evan
Saseendrakumar, Bharanidharan Radha
Bowd, Christopher
Belghith, Akram
Goldbaum, Michael Henry
Fazio, Massimo Antonio
Girkin, Christopher A.
De Moraes, C. Gustavo
Liebmann, Jeffrey M.
Weinreb, Robert N.
Baxter, Sally
Zangwill, Linda M.
INVESTIGATIVE OPHTHALMOLOGY & VISUAL SCIENCE, 2023, 64 (08)
[25] Reward modeling for mitigating toxicity in transformer-based language models
Faal, Farshid
Schmitt, Ketra
Yu, Jia Yuan
APPLIED INTELLIGENCE, 2023, 53 (07) : 8421 - 8435
[26] Transformer-based Text Detection in the Wild
Raisi, Zobeir
Naiel, Mohamed A.
Younes, Georges
Wardell, Steven
Zelek, John S.
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 3156 - 3165
[27] A Transformer-based Approach for Translating Natural Language to Bash Commands
Fu, Quchen
Teng, Zhongwei
White, Jules
Schmidt, Douglas C.
20TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2021), 2021, : 1245 - 1248
[28] Oil species identification based on fluorescence excitation-emission matrix and transformer-based deep learning
Xie, Ming
Xie, Lei
Li, Ying
Han, Bing
SPECTROCHIMICA ACTA PART A-MOLECULAR AND BIOMOLECULAR SPECTROSCOPY, 2023, 302
[29] Smart Home Notifications in Croatian Language: A Transformer-Based Approach
Simunec, Magdalena
Soic, Renato
2023 17TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS, CONTEL, 2023,
[30] Cyberbullying Identification System Based Deep Learning Algorithms
Aldhyani, Theyazn H. H.
Al-Adhaileh, Mosleh Hmoud
Alsubari, Saleh Nagi
ELECTRONICS, 2022, 11 (20)

← 1 2 3 4 5 →