Artificial Intelligence inspired method for cross-lingual cyberhate detection from low resource languages

被引:0
|
作者
Kaur, Manpreet [1 ]
Saini, Munish [1 ]
机构
[1] Guru Nanak Dev Univ, Dept Comp Engn & Technol, Amritsar, Punjab, India
关键词
Artificial Intelligence; cross-lingual; cyberhate; low resource languages; social media; HIGHER-EDUCATION; HATE SPEECH; COMMUNITY;
D O I
10.1145/3677176
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The appearance of inflammatory language on social media by college or university students is quite prevalent, inspiring platforms to engage in community safety mechanisms. Escalating hate speech entails creating sophisticated artificial intelligence-based, machine learning, and deep learning algorithms to detect offensive internet content. With a few noteworthy exceptions, the majority of the studies on automatic hate speech recognition have emphasized high-resource languages, mainly English. We bridge this gap by addressing hate speech detection in Punjabi (Gurmukhi), a low-resource Indo-Aryan language articulated in Indian educational institutions. This research identifies cross-lingual hate speech in the code-switched English-Punjabi language used on social media. It proposes an approach combining the best hate speech detection techniques to cover existing state-of-the-art system gaps and limitations. In this method, the Roman Punjabi is transliterated, and then Bidirectional Encoder Representations from Transformer (BERT) based models are employed for hate detection. The proposed model has achieved 0.86 precision and 0.83 recall, and various higher educational institutions could employ it to discover the issues/domains where hate prevails the most.
引用
收藏
页数:23
相关论文
共 50 条
  • [41] Cross-lingual transfer learning during supervised training in low resource scenarios
    Das, Amit
    Hasegawa-Johnson, Mark
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3531 - 3535
  • [42] Exploiting Cross-Lingual Subword Similarities in Low-Resource Document Classification
    Zhang, Mozhi
    Fujinuma, Yoshinari
    Boyd-Graber, Jordan
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 9547 - 9554
  • [43] CROSS-LINGUAL SPEECH RECOGNITION BETWEEN LANGUAGES FROM THE SAME LANGUAGE FAMILY
    Zgank, Andrej
    PROCEEDINGS OF THE ROMANIAN ACADEMY SERIES A-MATHEMATICS PHYSICS TECHNICAL SCIENCES INFORMATION SCIENCE, 2019, 20 (02): : 184 - 191
  • [44] MetaXL: Meta Representation Transformation for Low-resource Cross-lingual Learning
    Xia, Mengzhou
    Zheng, Guoqing
    Mukherjee, Subhabrata
    Shokouhi, Milad
    Neubig, Graham
    Awadallah, Ahmed Hassan
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 499 - 511
  • [45] Cross-Lingual and Ensemble MLPs Strategies for Low-Resource Speech Recognition
    Qian, Yanmin
    Liu, Jia
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2581 - 2584
  • [46] Unsupervised Stem-based Cross-lingual Part-of-Speech Tagging for Morphologically Rich Low-Resource Languages
    Eskander, Ramy
    Lowry, Cass
    Khandagale, Sujay
    Klavans, Judith
    Polinsky, Maria
    Muresan, Smaranda
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 4061 - 4072
  • [47] Good Meta-tasks Make A Better Cross-lingual Meta-transfer Learning for Low-resource Languages
    Wu, Linjuan
    Guo, Zongyi
    Cui, Baoliang
    Tang, Haihong
    Lu, Weiming
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 7431 - 7446
  • [48] Multilingual and Cross-Lingual Intent Detection from Spoken Data
    Gerz, Daniela
    Su, Pei-Hao
    Kusztos, Razvan
    Mondal, Avishek
    Lis, Michal
    Singhal, Eshan
    Mrksic, Nikola
    Wen, Tsung-Hsien
    Vulic, Ivan
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 7468 - 7475
  • [49] A Low Cost Machine Translation Method for Cross-Lingual Information Retrieval
    Bracewell, David B.
    Ren, Fuji
    Kuroiwa, Shingo
    ENGINEERING LETTERS, 2008, 16 (01)
  • [50] Cross-Lingual Summarization Method Based on Joint Training and Self-Training in Low-Resource Scenarios
    Cheng, Shaohuan
    Tang, Yujia
    Liu, Qiao
    Chen, Wenyu
    Dianzi Keji Daxue Xuebao/Journal of the University of Electronic Science and Technology of China, 2024, 53 (05): : 762 - 770