UHated: hate speech detection in Urdu language using transfer learning

被引:5
|
作者
Arshad, Muhammad Umair [1 ]
Ali, Raza [1 ]
Beg, Mirza Omer [1 ]
Shahzad, Waseem [1 ]
机构
[1] Natl Univ Comp & Emerging Sci, Islamabad, Pakistan
关键词
Hate speech detection; Deep learning; Language semantics; Twitter; Social network analysis; Low-resource languages;
D O I
10.1007/s10579-023-09642-7
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Social media has become a driving force for social change in the global society. Events that take place in one part of the world can quickly reverberate across the globe due to the vast amount of data generated on these platforms. However, developers of these platforms face numerous challenges in keeping cyberspace as inclusive and healthy as possible. In recent years, there has been an increase in offensive and hate speech on social media. Manual efforts to address this issue have been inadequate due to the vast scope of the problem. Therefore, there is a need for an automated technique that can detect and remove offensive and hateful comments before they can cause harm. In this research, we use transfer learning to utilize pre-trained FastText Urdu word embeddings and multi-lingual BERT embeddings (RoBERTa) for our task. We also develop an Urdu language hate lexicon and use it to create an annotated dataset of 7800 Urdu tweets. Our results show that RoBERTa is able to achieve a macro F1-score of 0.82 on our multi-class classification task, outperforming deep learning and machine learning baseline models.
引用
收藏
页码:713 / 732
页数:20
相关论文
共 50 条
  • [1] UHated: hate speech detection in Urdu language using transfer learning
    Muhammad Umair Arshad
    Raza Ali
    Mirza Omer Beg
    Waseem Shahzad
    [J]. Language Resources and Evaluation, 2023, 57 : 713 - 732
  • [2] Hate Speech and Target Community Detection in Nastaliq Urdu Using Transfer Learning Techniques
    Malik, Muhammad Shahid Iqbal
    Nawaz, Aftab
    Jamjoom, Mona Mamdouh
    [J]. IEEE ACCESS, 2024, 12 : 116875 - 116890
  • [3] Hate-Speech and Offensive Language Detection in Roman Urdu
    Rizwan, Hammad
    Shakeel, Muhammad Haroon
    Karim, Asim
    [J]. PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 2512 - 2522
  • [4] Offensive Language and Hate Speech Detection Based on Transfer Learning
    Touahri, Ibtissam
    Mazroui, Azzeddine
    [J]. ADVANCED INTELLIGENT SYSTEMS FOR SUSTAINABLE DEVELOPMENT (AI2SD'2020), VOL 2, 2022, 1418 : 300 - 311
  • [5] Hate speech detection on Twitter using transfer learning
    Ali, Raza
    Farooq, Umar
    Arshad, Umair
    Shahzad, Waseem
    Beg, Mirza Omer
    [J]. COMPUTER SPEECH AND LANGUAGE, 2022, 74
  • [6] Hate Speech Detection in Roman Urdu
    Khan, Muhammad Moin
    Shahzad, Khurram
    Malik, Muhammad Kamran
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2021, 20 (01)
  • [7] Improving Hate Speech Detection of Urdu Tweets Using Sentiment Analysis
    Ali, Muhammad Z.
    Ehsan-Ul-Haq
    Rauf, Sahar
    Javed, Kashif
    Hussain, Sarmad
    [J]. IEEE ACCESS, 2021, 9 : 84296 - 84305
  • [8] Transfer learning for hate speech detection in social media
    Yuan, Lanqin
    Wang, Tianyu
    Ferraro, Gabriela
    Suominen, Hanna
    Rizoiu, Marian-Andrei
    [J]. JOURNAL OF COMPUTATIONAL SOCIAL SCIENCE, 2023, 6 (02): : 1081 - 1101
  • [9] Transfer learning for hate speech detection in social media
    Lanqin Yuan
    Tianyu Wang
    Gabriela Ferraro
    Hanna Suominen
    Marian-Andrei Rizoiu
    [J]. Journal of Computational Social Science, 2023, 6 : 1081 - 1101
  • [10] Hate Speech Detection using Word Embedding and Deep Learning in the Arabic Language Context
    Faris, Hossam
    Aljarah, Ibrahim
    Habib, Maria
    Castillo, Pedro A.
    [J]. ICPRAM: PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION APPLICATIONS AND METHODS, 2020, : 453 - 460