Deep-BERT: Transfer Learning for Classifying Multilingual Offensive Texts on Social Media

被引：18

作者：

Wadud, Md Anwar Hussen ^{[1
]}

Mridha, M. F. ^{[1
]}

Shin, Jungpil ^{[2
]}

Nur, Kamruddin ^{[3
]}

Saha, Aloke Kumar ^{[4
]}

机构：

[1] Bangladesh Univ Business & Technol, Dept Comp Sci & Engn, Dhaka, Bangladesh

[2] Univ Aizu, Sch Comp Sci & Engn, Aizu Wakamatsu, Fukushima, Japan

[3] Amer Int Univ Bangladesh, Dept Comp Sci, Dhaka, Bangladesh

[4] Univ Asia Pacific, Dept Comp Sci & Engn, Dhaka, Bangladesh

来源：

COMPUTER SYSTEMS SCIENCE AND ENGINEERING | 2023年 / 44卷 / 02期

关键词：

Offensive text classification; deep convolutional neural network (DCNN); bidirectional encoder representations from transformers (BERT); natural language processing (NLP);

D O I：

10.32604/csse.2023.027841

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Offensive messages on social media, have recently been frequently used to harass and criticize people. In recent studies, many promising algorithms have been developed to identify offensive texts. Most algorithms analyze text in a unidirectional manner, where a bidirectional method can maximize performance results and capture semantic and contextual information in sentences. In addition, there are many separate models for identifying offensive texts based on monolingual and multilingual, but there are a few models that can detect both monolingual and multilingual-based offensive texts. In this study, a detection system has been developed for both monolingual and multilingual offensive texts by combining deep convolutional neural network and bidirectional encoder representations from transformers (Deep-BERT) to identify offensive posts on social media that are used to harass others. This paper explores a variety of ways to deal with multilingualism, including collaborative multilingual and translation-based approaches. Then, the Deep-BERT is tested on the Bengali and English datasets, including the different bidirectional encoder representations from transformers (BERT) pre-trained word-embedding techniques, and found that the proposed DeepBERT's efficacy outperformed all existing offensive text classification algorithms reaching an accuracy of 91.83%. The proposed model is a state-of-the-art model that can classify both monolingual-based and multilingual-based offensive texts.

引用

页码：1775 / 1791

页数：17

共 50 条

[21] FakeBERT: Fake news detection in social media with a BERT-based deep learning approach
Kaliyar, Rohit Kumar
Goswami, Anurag
Narang, Pratik
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (08) : 11765 - 11788
[22] Less Annotating, More Classifying: Addressing the Data Scarcity Issue of Supervised Machine Learning with Deep Transfer Learning and BERT-NLI
Laurer, Moritz
van Atteveldt, Wouter
Casas, Andreu
Welbers, Kasper
[J]. POLITICAL ANALYSIS, 2024, 32 (01): : 84 - 100
[23] Classifying the Ideological Orientation of User-Submitted Texts in Social Media
Ravi, Kamalakkannan
Vela, Adan Ernesto
Ewetz, Rickard
[J]. 2022 21ST IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, ICMLA, 2022, : 413 - 418
[24] Vector Space Representations of Documents in Classifying Finnish Social Media Texts
Venekoski, Viljami
Puuska, Samir
Vankka, Jouko
[J]. INFORMATION AND SOFTWARE TECHNOLOGIES, ICIST 2016, 2016, 639 : 525 - 535
[25] A BERT-Based Transfer Learning Approach for Hate Speech Detection in Online Social Media
Mozafari, Marzieh
Farahbakhsh, Reza
Crespi, Noel
[J]. COMPLEX NETWORKS AND THEIR APPLICATIONS VIII, VOL 1, 2020, 881 : 928 - 940
[26] A Deep Learning-Based Framework for Offensive Text Detection in Unstructured Data for Heterogeneous Social Media
Bacha, Jamshid
Ullah, Farman
Khan, Jebran
Sardar, Abdul Wasay
Lee, Sungchang
[J]. IEEE ACCESS, 2023, 11 : 124484 - 124498
[27] mBERT-GRU multilingual deep learning framework for hate speech detection in social media
Singh, Pardeep
Singh, Nitin Kumar
Monika
Chand, Satish
[J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 44 (05) : 8177 - 8192
[28] Offensive Language Detection on Social Media using Machine Learning
Abdrakhmanov, Rustam
Kenesbayev, Serik Muktarovich
Berkimbayev, Kamalbek
Toikenov, Gumyrbek
Abdrashova, Elmira
Alchinbayeva, Oichagul
Ydyrys, Aizhan
[J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (05) : 575 - 582
[29] Negative Stances Detection from Multilingual Data Streams in Low-Resource Languages on Social Media Using BERT and CNN-Based Transfer Learning Model
Kumar, Sanjay
[J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (01)
[30] L-Boost: Identifying Offensive Texts From Social Media Post in Bengali
Mridha, M. F.
Wadud, Md Anwar Hussen
Hamid, Md Abdul
Monowar, Muhammad Mostafa
Abdullah-Al-Wadud, M.
Alamri, Atif
[J]. IEEE ACCESS, 2021, 9 : 164681 - 164699

← 1 2 3 4 5 →