Unsupervised offensive speech detection for multimedia based on multilingual BERT

被引:0
|
作者
Liu, Ge [1 ]
Yang, Xiaona [2 ]
Shi, Xiayang [2 ]
Li, Yinlin [3 ]
机构
[1] Xuchang Vocat & Tech Coll, Xuchang 461000, Henan, Peoples R China
[2] Zhengzhou Univ Light Ind, Software Engn Coll, Zhengzhou 450000, Henan, Peoples R China
[3] Chinese Acad Sci, Inst Automat, Beijing 100190, Peoples R China
基金
中国国家自然科学基金;
关键词
natural language processing; offensive speech detection; social media;
D O I
10.1504/IJSNET.2024.142516
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
There is a significant amount of offensive speech in multimedia, which seriously negatively impacts social stability. With the proliferation of sensor-equipped devices contributing to social media data, detecting offensive speech within this vast dataset has emerged as a critical challenge. However, most existing methods have focused only on a few high-resource languages. This paper proposes a cross-lingual aggressive transfer learning method based on bidirectional encoder representations from transformers (BERT) for automatically detecting offensive speech in low-resource languages. Initially, we utilise the multilingual BERT model to learn the characteristics of aggressive speech from a high-resource language dataset to establish an initial model. Subsequently, based on the linguistic similarity between languages, this model is transferred to low-resource languages. Experimental results demonstrate that our method achieves higher detection accuracy in multiple languages including English, Danish, Arabic, Turkish, and Greek, particularly excelling in low-resource languages.
引用
收藏
页码:186 / 196
页数:12
相关论文
共 50 条
  • [21] Multimedia, multilingual teaching and training system for children with speech disorders
    Vicsi K.
    Roach P.
    Öster A.
    Kacic Z.
    Barczikay P.
    Tantos A.
    Csatári F.
    Bakcsi Zs.
    Sfakianaki A.
    International Journal of Speech Technology, 2000, 3 (3-4) : 289 - 300
  • [22] Offensive Language and Hate Speech Detection for Danish
    Sigurbergsson, Gudbjartur Ingi
    Derczynski, Leon
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 3498 - 3508
  • [23] Unsupervised Embeddings with Graph Auto-Encoders for Multi-domain and Multilingual Hate Speech Detection
    De la Pena Sarracen, Gretel Liz
    Rosso, Paolo
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 2196 - 2204
  • [24] Advancing offensive language detection in Arabic social media: a BERT-based ensemble learning approach
    Mazari, Ahmed Cherif
    Benterkia, Asmaa
    Takdenti, Zineb
    SOCIAL NETWORK ANALYSIS AND MINING, 2024, 14 (01)
  • [25] Experimenting with Unsupervised Multilingual Event Detection in Historical Newspapers
    Boros, Emanuela
    Cabrera-Diego, Luis Adrian
    Doucet, Antoine
    FROM BORN-PHYSICAL TO BORN-VIRTUAL: AUGMENTING INTELLIGENCE IN DIGITAL LIBRARIES, ICADL 2022, 2022, 13636 : 182 - 193
  • [26] Embedded Discriminant Analysis based Speech Activity Detection for Unsupervised Stress Speech Clustering
    Prasetio, Barlian Henryranu
    Tamura, Hiroki
    Tanno, Koichi
    2020 JOINT 9TH INTERNATIONAL CONFERENCE ON INFORMATICS, ELECTRONICS & VISION (ICIEV) AND 2020 4TH INTERNATIONAL CONFERENCE ON IMAGING, VISION & PATTERN RECOGNITION (ICIVPR), 2020,
  • [27] Hate and offensive speech detection on Arabic social media
    Alsafari S.
    Sadaoui S.
    Mouhoub M.
    Online Social Networks and Media, 2020, 19
  • [28] A Multilingual Evaluation for Online Hate Speech Detection
    Corazza, Michele
    Menini, Stefano
    Cabrio, Elena
    Tonelli, Sara
    Villata, Serena
    ACM TRANSACTIONS ON INTERNET TECHNOLOGY, 2020, 20 (02)
  • [29] Multilingual Speech to Speech MT based chat system
    Gopi, Arun
    Devi, Shobana P.
    Sajini, T.
    Stephen, Jose
    Bhadhran, V. K.
    2015 INTERNATIONAL CONFERENCE ON COMPUTING AND NETWORK COMMUNICATIONS (COCONET), 2015, : 771 - 776
  • [30] Crosslingual and Multilingual Speech Recognition Based on the Speech Manifold
    Sahraeian, Reza
    Van Compernolle, Dirk
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (12) : 2301 - 2312