Unsupervised offensive speech detection for multimedia based on multilingual BERT

被引:0
|
作者
Liu, Ge [1 ]
Yang, Xiaona [2 ]
Shi, Xiayang [2 ]
Li, Yinlin [3 ]
机构
[1] Xuchang Vocat & Tech Coll, Xuchang 461000, Henan, Peoples R China
[2] Zhengzhou Univ Light Ind, Software Engn Coll, Zhengzhou 450000, Henan, Peoples R China
[3] Chinese Acad Sci, Inst Automat, Beijing 100190, Peoples R China
基金
中国国家自然科学基金;
关键词
natural language processing; offensive speech detection; social media;
D O I
10.1504/IJSNET.2024.142516
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
There is a significant amount of offensive speech in multimedia, which seriously negatively impacts social stability. With the proliferation of sensor-equipped devices contributing to social media data, detecting offensive speech within this vast dataset has emerged as a critical challenge. However, most existing methods have focused only on a few high-resource languages. This paper proposes a cross-lingual aggressive transfer learning method based on bidirectional encoder representations from transformers (BERT) for automatically detecting offensive speech in low-resource languages. Initially, we utilise the multilingual BERT model to learn the characteristics of aggressive speech from a high-resource language dataset to establish an initial model. Subsequently, based on the linguistic similarity between languages, this model is transferred to low-resource languages. Experimental results demonstrate that our method achieves higher detection accuracy in multiple languages including English, Danish, Arabic, Turkish, and Greek, particularly excelling in low-resource languages.
引用
收藏
页码:186 / 196
页数:12
相关论文
共 50 条
  • [1] MUDES: Multilingual Detection of Offensive Spans
    Ranasinghe, Tharindu
    Zampieri, Marcos
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES: DEMONSTRATIONS (NAACL-HLT 2021), 2021, : 144 - 152
  • [2] A BERT-Based Approach for Multilingual Discourse Connective Detection
    Muermans, Thomas Chapados
    Kosseim, Leila
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS (NLDB 2022), 2022, 13286 : 449 - 460
  • [3] Offensive Hebrew Corpus and Detection using BERT
    Hamad, Nagham
    Jarrar, Mustafa
    Khalilia, Mohammad
    Nashif, Nadim
    2023 20TH ACS/IEEE INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS, AICCSA, 2023,
  • [4] BERT-based Approach to Arabic Hate Speech and Offensive Language Detection in Twitter: Exploiting Emojis and Sentiment Analysis
    Althobaiti, Maha Jarallah
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (05) : 972 - 980
  • [5] Elevating Offensive Language Detection: CNN-GRU and BERT for Enhanced Hate Speech Identification
    Madhavi, M.
    Agal, Sanjay
    Odedra, Niyati Dhirubhai
    Chowdhary, Harish
    Ruprah, Taranpreet Singh
    Vuyyuru, Veera Ankalu
    El-Ebiary, Yousef A. Baker
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (05) : 1164 - 1172
  • [6] Speech Activity Detection Based on Multilingual Speech Recognition System
    Sarfjoo, Seyyed Saeed
    Madikeri, Srikanth
    Motlicek, Petr
    INTERSPEECH 2021, 2021, : 4369 - 4373
  • [7] Homophobic and Hate Speech Detection Using Multilingual-BERT Mode on Turkish Social Media
    Karayigit, Habibe
    Akdagli, Ali
    Aci, Cikdem Inan
    INFORMATION TECHNOLOGY AND CONTROL, 2022, 51 (02): : 356 - 375
  • [8] Mono vs Multilingual BERT for Hate Speech Detection and Text Classification: A Case Study in Marathi
    Velankar, Abhishek
    Patil, Hrushikesh
    Joshi, Raviraj
    ARTIFICIAL NEURAL NETWORKS IN PATTERN RECOGNITION, ANNPR 2022, 2023, 13739 : 121 - 128
  • [9] Deep-BERT: Transfer Learning for Classifying Multilingual Offensive Texts on Social Media
    Wadud, Md Anwar Hussen
    Mridha, M. F.
    Shin, Jungpil
    Nur, Kamruddin
    Saha, Aloke Kumar
    COMPUTER SYSTEMS SCIENCE AND ENGINEERING, 2023, 44 (02): : 1775 - 1791
  • [10] Unsupervised multilingual sentence boundary detection
    Kiss, Tibor
    Strunk, Jan
    COMPUTATIONAL LINGUISTICS, 2006, 32 (04) : 485 - 525