Deep-BERT: Transfer Learning for Classifying Multilingual Offensive Texts on Social Media

被引:18
|
作者
Wadud, Md Anwar Hussen [1 ]
Mridha, M. F. [1 ]
Shin, Jungpil [2 ]
Nur, Kamruddin [3 ]
Saha, Aloke Kumar [4 ]
机构
[1] Bangladesh Univ Business & Technol, Dept Comp Sci & Engn, Dhaka, Bangladesh
[2] Univ Aizu, Sch Comp Sci & Engn, Aizu Wakamatsu, Fukushima, Japan
[3] Amer Int Univ Bangladesh, Dept Comp Sci, Dhaka, Bangladesh
[4] Univ Asia Pacific, Dept Comp Sci & Engn, Dhaka, Bangladesh
来源
关键词
Offensive text classification; deep convolutional neural network (DCNN); bidirectional encoder representations from transformers (BERT); natural language processing (NLP);
D O I
10.32604/csse.2023.027841
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Offensive messages on social media, have recently been frequently used to harass and criticize people. In recent studies, many promising algorithms have been developed to identify offensive texts. Most algorithms analyze text in a unidirectional manner, where a bidirectional method can maximize performance results and capture semantic and contextual information in sentences. In addition, there are many separate models for identifying offensive texts based on monolingual and multilingual, but there are a few models that can detect both monolingual and multilingual-based offensive texts. In this study, a detection system has been developed for both monolingual and multilingual offensive texts by combining deep convolutional neural network and bidirectional encoder representations from transformers (Deep-BERT) to identify offensive posts on social media that are used to harass others. This paper explores a variety of ways to deal with multilingualism, including collaborative multilingual and translation-based approaches. Then, the Deep-BERT is tested on the Bengali and English datasets, including the different bidirectional encoder representations from transformers (BERT) pre-trained word-embedding techniques, and found that the proposed DeepBERT's efficacy outperformed all existing offensive text classification algorithms reaching an accuracy of 91.83%. The proposed model is a state-of-the-art model that can classify both monolingual-based and multilingual-based offensive texts.
引用
收藏
页码:1775 / 1791
页数:17
相关论文
共 50 条
  • [21] FakeBERT: Fake news detection in social media with a BERT-based deep learning approach
    Kaliyar, Rohit Kumar
    Goswami, Anurag
    Narang, Pratik
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (08) : 11765 - 11788
  • [22] Less Annotating, More Classifying: Addressing the Data Scarcity Issue of Supervised Machine Learning with Deep Transfer Learning and BERT-NLI
    Laurer, Moritz
    van Atteveldt, Wouter
    Casas, Andreu
    Welbers, Kasper
    [J]. POLITICAL ANALYSIS, 2024, 32 (01): : 84 - 100
  • [23] Classifying the Ideological Orientation of User-Submitted Texts in Social Media
    Ravi, Kamalakkannan
    Vela, Adan Ernesto
    Ewetz, Rickard
    [J]. 2022 21ST IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, ICMLA, 2022, : 413 - 418
  • [24] Vector Space Representations of Documents in Classifying Finnish Social Media Texts
    Venekoski, Viljami
    Puuska, Samir
    Vankka, Jouko
    [J]. INFORMATION AND SOFTWARE TECHNOLOGIES, ICIST 2016, 2016, 639 : 525 - 535
  • [25] A BERT-Based Transfer Learning Approach for Hate Speech Detection in Online Social Media
    Mozafari, Marzieh
    Farahbakhsh, Reza
    Crespi, Noel
    [J]. COMPLEX NETWORKS AND THEIR APPLICATIONS VIII, VOL 1, 2020, 881 : 928 - 940
  • [26] A Deep Learning-Based Framework for Offensive Text Detection in Unstructured Data for Heterogeneous Social Media
    Bacha, Jamshid
    Ullah, Farman
    Khan, Jebran
    Sardar, Abdul Wasay
    Lee, Sungchang
    [J]. IEEE ACCESS, 2023, 11 : 124484 - 124498
  • [27] mBERT-GRU multilingual deep learning framework for hate speech detection in social media
    Singh, Pardeep
    Singh, Nitin Kumar
    Monika
    Chand, Satish
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 44 (05) : 8177 - 8192
  • [28] Offensive Language Detection on Social Media using Machine Learning
    Abdrakhmanov, Rustam
    Kenesbayev, Serik Muktarovich
    Berkimbayev, Kamalbek
    Toikenov, Gumyrbek
    Abdrashova, Elmira
    Alchinbayeva, Oichagul
    Ydyrys, Aizhan
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (05) : 575 - 582
  • [29] Negative Stances Detection from Multilingual Data Streams in Low-Resource Languages on Social Media Using BERT and CNN-Based Transfer Learning Model
    Kumar, Sanjay
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (01)
  • [30] L-Boost: Identifying Offensive Texts From Social Media Post in Bengali
    Mridha, M. F.
    Wadud, Md Anwar Hussen
    Hamid, Md Abdul
    Monowar, Muhammad Mostafa
    Abdullah-Al-Wadud, M.
    Alamri, Atif
    [J]. IEEE ACCESS, 2021, 9 : 164681 - 164699