Unsupervised offensive speech detection for multimedia based on multilingual BERT

被引：0

作者：

Liu, Ge ^{[1
]}

Yang, Xiaona ^{[2
]}

Shi, Xiayang ^{[2
]}

Li, Yinlin ^{[3
]}

机构：

[1] Xuchang Vocat & Tech Coll, Xuchang 461000, Henan, Peoples R China

[2] Zhengzhou Univ Light Ind, Software Engn Coll, Zhengzhou 450000, Henan, Peoples R China

[3] Chinese Acad Sci, Inst Automat, Beijing 100190, Peoples R China

来源：

INTERNATIONAL JOURNAL OF SENSOR NETWORKS | 2024年 / 46卷 / 03期

基金：

中国国家自然科学基金;

关键词：

natural language processing; offensive speech detection; social media;

D O I：

10.1504/IJSNET.2024.142516

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

There is a significant amount of offensive speech in multimedia, which seriously negatively impacts social stability. With the proliferation of sensor-equipped devices contributing to social media data, detecting offensive speech within this vast dataset has emerged as a critical challenge. However, most existing methods have focused only on a few high-resource languages. This paper proposes a cross-lingual aggressive transfer learning method based on bidirectional encoder representations from transformers (BERT) for automatically detecting offensive speech in low-resource languages. Initially, we utilise the multilingual BERT model to learn the characteristics of aggressive speech from a high-resource language dataset to establish an initial model. Subsequently, based on the linguistic similarity between languages, this model is transferred to low-resource languages. Experimental results demonstrate that our method achieves higher detection accuracy in multiple languages including English, Danish, Arabic, Turkish, and Greek, particularly excelling in low-resource languages.

引用

页码：186 / 196

页数：12

共 50 条

[21] Multimedia, multilingual teaching and training system for children with speech disorders
Vicsi K.
Roach P.
Öster A.
Kacic Z.
Barczikay P.
Tantos A.
Csatári F.
Bakcsi Zs.
Sfakianaki A.
International Journal of Speech Technology, 2000, 3 (3-4) : 289 - 300
[22] Offensive Language and Hate Speech Detection for Danish
Sigurbergsson, Gudbjartur Ingi
Derczynski, Leon
PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 3498 - 3508
[23] Unsupervised Embeddings with Graph Auto-Encoders for Multi-domain and Multilingual Hate Speech Detection
De la Pena Sarracen, Gretel Liz
Rosso, Paolo
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 2196 - 2204
[24] Advancing offensive language detection in Arabic social media: a BERT-based ensemble learning approach
Mazari, Ahmed Cherif
Benterkia, Asmaa
Takdenti, Zineb
SOCIAL NETWORK ANALYSIS AND MINING, 2024, 14 (01)
[25] Experimenting with Unsupervised Multilingual Event Detection in Historical Newspapers
Boros, Emanuela
Cabrera-Diego, Luis Adrian
Doucet, Antoine
FROM BORN-PHYSICAL TO BORN-VIRTUAL: AUGMENTING INTELLIGENCE IN DIGITAL LIBRARIES, ICADL 2022, 2022, 13636 : 182 - 193
[26] Embedded Discriminant Analysis based Speech Activity Detection for Unsupervised Stress Speech Clustering
Prasetio, Barlian Henryranu
Tamura, Hiroki
Tanno, Koichi
2020 JOINT 9TH INTERNATIONAL CONFERENCE ON INFORMATICS, ELECTRONICS & VISION (ICIEV) AND 2020 4TH INTERNATIONAL CONFERENCE ON IMAGING, VISION & PATTERN RECOGNITION (ICIVPR), 2020,
[27] Hate and offensive speech detection on Arabic social media
Alsafari S.
Sadaoui S.
Mouhoub M.
Online Social Networks and Media, 2020, 19
[28] A Multilingual Evaluation for Online Hate Speech Detection
Corazza, Michele
Menini, Stefano
Cabrio, Elena
Tonelli, Sara
Villata, Serena
ACM TRANSACTIONS ON INTERNET TECHNOLOGY, 2020, 20 (02)
[29] Multilingual Speech to Speech MT based chat system
Gopi, Arun
Devi, Shobana P.
Sajini, T.
Stephen, Jose
Bhadhran, V. K.
2015 INTERNATIONAL CONFERENCE ON COMPUTING AND NETWORK COMMUNICATIONS (COCONET), 2015, : 771 - 776
[30] Crosslingual and Multilingual Speech Recognition Based on the Speech Manifold
Sahraeian, Reza
Van Compernolle, Dirk
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (12) : 2301 - 2312

← 1 2 3 4 5 →