Unsupervised offensive speech detection for multimedia based on multilingual BERT

被引：0

作者：

Liu, Ge ^{[1
]}

Yang, Xiaona ^{[2
]}

Shi, Xiayang ^{[2
]}

Li, Yinlin ^{[3
]}

机构：

[1] Xuchang Vocat & Tech Coll, Xuchang 461000, Henan, Peoples R China

[2] Zhengzhou Univ Light Ind, Software Engn Coll, Zhengzhou 450000, Henan, Peoples R China

[3] Chinese Acad Sci, Inst Automat, Beijing 100190, Peoples R China

来源：

INTERNATIONAL JOURNAL OF SENSOR NETWORKS | 2024年 / 46卷 / 03期

基金：

中国国家自然科学基金;

关键词：

natural language processing; offensive speech detection; social media;

D O I：

10.1504/IJSNET.2024.142516

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

There is a significant amount of offensive speech in multimedia, which seriously negatively impacts social stability. With the proliferation of sensor-equipped devices contributing to social media data, detecting offensive speech within this vast dataset has emerged as a critical challenge. However, most existing methods have focused only on a few high-resource languages. This paper proposes a cross-lingual aggressive transfer learning method based on bidirectional encoder representations from transformers (BERT) for automatically detecting offensive speech in low-resource languages. Initially, we utilise the multilingual BERT model to learn the characteristics of aggressive speech from a high-resource language dataset to establish an initial model. Subsequently, based on the linguistic similarity between languages, this model is transferred to low-resource languages. Experimental results demonstrate that our method achieves higher detection accuracy in multiple languages including English, Danish, Arabic, Turkish, and Greek, particularly excelling in low-resource languages.

引用

页码：186 / 196

页数：12

共 50 条

[1] MUDES: Multilingual Detection of Offensive Spans
Ranasinghe, Tharindu
Zampieri, Marcos
2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES: DEMONSTRATIONS (NAACL-HLT 2021), 2021, : 144 - 152
[2] A BERT-Based Approach for Multilingual Discourse Connective Detection
Muermans, Thomas Chapados
Kosseim, Leila
NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS (NLDB 2022), 2022, 13286 : 449 - 460
[3] Offensive Hebrew Corpus and Detection using BERT
Hamad, Nagham
Jarrar, Mustafa
Khalilia, Mohammad
Nashif, Nadim
2023 20TH ACS/IEEE INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS, AICCSA, 2023,
[4] BERT-based Approach to Arabic Hate Speech and Offensive Language Detection in Twitter: Exploiting Emojis and Sentiment Analysis
Althobaiti, Maha Jarallah
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (05) : 972 - 980
[5] Elevating Offensive Language Detection: CNN-GRU and BERT for Enhanced Hate Speech Identification
Madhavi, M.
Agal, Sanjay
Odedra, Niyati Dhirubhai
Chowdhary, Harish
Ruprah, Taranpreet Singh
Vuyyuru, Veera Ankalu
El-Ebiary, Yousef A. Baker
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (05) : 1164 - 1172
[6] Speech Activity Detection Based on Multilingual Speech Recognition System
Sarfjoo, Seyyed Saeed
Madikeri, Srikanth
Motlicek, Petr
INTERSPEECH 2021, 2021, : 4369 - 4373
[7] Homophobic and Hate Speech Detection Using Multilingual-BERT Mode on Turkish Social Media
Karayigit, Habibe
Akdagli, Ali
Aci, Cikdem Inan
INFORMATION TECHNOLOGY AND CONTROL, 2022, 51 (02): : 356 - 375
[8] Mono vs Multilingual BERT for Hate Speech Detection and Text Classification: A Case Study in Marathi
Velankar, Abhishek
Patil, Hrushikesh
Joshi, Raviraj
ARTIFICIAL NEURAL NETWORKS IN PATTERN RECOGNITION, ANNPR 2022, 2023, 13739 : 121 - 128
[9] Deep-BERT: Transfer Learning for Classifying Multilingual Offensive Texts on Social Media
Wadud, Md Anwar Hussen
Mridha, M. F.
Shin, Jungpil
Nur, Kamruddin
Saha, Aloke Kumar
COMPUTER SYSTEMS SCIENCE AND ENGINEERING, 2023, 44 (02): : 1775 - 1791
[10] Unsupervised multilingual sentence boundary detection
Kiss, Tibor
Strunk, Jan
COMPUTATIONAL LINGUISTICS, 2006, 32 (04) : 485 - 525

← 1 2 3 4 5 →