Artificial Intelligence inspired method for cross-lingual cyberhate detection from low resource languages

被引:0
|
作者
Kaur, Manpreet [1 ]
Saini, Munish [1 ]
机构
[1] Guru Nanak Dev Univ, Dept Comp Engn & Technol, Amritsar, Punjab, India
关键词
Artificial Intelligence; cross-lingual; cyberhate; low resource languages; social media; HIGHER-EDUCATION; HATE SPEECH; COMMUNITY;
D O I
10.1145/3677176
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The appearance of inflammatory language on social media by college or university students is quite prevalent, inspiring platforms to engage in community safety mechanisms. Escalating hate speech entails creating sophisticated artificial intelligence-based, machine learning, and deep learning algorithms to detect offensive internet content. With a few noteworthy exceptions, the majority of the studies on automatic hate speech recognition have emphasized high-resource languages, mainly English. We bridge this gap by addressing hate speech detection in Punjabi (Gurmukhi), a low-resource Indo-Aryan language articulated in Indian educational institutions. This research identifies cross-lingual hate speech in the code-switched English-Punjabi language used on social media. It proposes an approach combining the best hate speech detection techniques to cover existing state-of-the-art system gaps and limitations. In this method, the Roman Punjabi is transliterated, and then Bidirectional Encoder Representations from Transformer (BERT) based models are employed for hate detection. The proposed model has achieved 0.86 precision and 0.83 recall, and various higher educational institutions could employ it to discover the issues/domains where hate prevails the most.
引用
收藏
页数:23
相关论文
共 50 条
  • [31] Cross-Lingual Word Embeddings for Low-Resource Language Modeling
    Adams, Oliver
    Makarucha, Adam
    Neubig, Graham
    Bird, Steven
    Cohn, Trevor
    15TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2017), VOL 1: LONG PAPERS, 2017, : 937 - 947
  • [32] A resource-light method for cross-lingual semantic textual similarity
    Glavas, Goran
    Franco-Salvador, Marc
    Ponzetto, Simone P.
    Rosso, Paolo
    KNOWLEDGE-BASED SYSTEMS, 2018, 143 : 1 - 9
  • [33] Cross-Lingual Transfer of Large Language Model by Visually-Derived Supervision Toward Low-Resource Languages
    Muraoka, Masayasu
    Bhattacharjee, Bishwaranjan
    Merler, Michele
    Blackwood, Graeme
    Li, Yulong
    Zhao, Yang
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 3637 - 3646
  • [34] A two-stage fine-tuning method for low-resource cross-lingual summarization
    Zhang, Kaixiong
    Zhang, Yongbing
    Yu, Zhengtao
    Huang, Yuxin
    Tan, Kaiwen
    MATHEMATICAL BIOSCIENCES AND ENGINEERING, 2024, 21 (01) : 1125 - 1143
  • [35] Transitioning Representations between Languages for Cross-lingual Event Detection via Langevin Dynamics
    Nguyen, Chien Van
    Huy Huu Nguyen
    Dernoncourt, Franck
    Thien Huu Nguyen
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 14085 - 14093
  • [36] ASR data augmentation in low-resource settings using cross-lingual multi-speaker TTS and cross-lingual voice conversion
    Casanova, Edresson
    Shulby, Christopher
    Korolev, Alexander
    Candido Junior, Arnaldo
    Soares, Anderson da Silva
    Aluisio, Sandra
    Ponti, Moacir Antonelli
    INTERSPEECH 2023, 2023, : 1244 - 1248
  • [37] UniSplice: Universal Cross-Lingual Data Splicing for Low-Resource ASR
    Wang, Wei
    Qian, Yanmin
    INTERSPEECH 2023, 2023, : 2253 - 2257
  • [38] Cross-lingual and Cross-domain Transfer Learning for Automatic Term Extraction from Low Resource Data
    Hazem, Amir
    Bouhandi, Meriem
    Boudin, Florian
    Daille, Beatrice
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 648 - 662
  • [39] Improving Low-Resource Cross-lingual Parsing with Expected Statistic Regularization
    Effland, Thomas
    Collins, Michael
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2023, 11 : 122 - 138
  • [40] Improving Candidate Generation for Low-resource Cross-lingual Entity Linking
    Zhou, Shuyan
    Rijhwani, Shruti
    Wieting, John
    Carbonell, Jaime
    Neubig, Graham
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2020, 8 : 109 - 124