Hate Speech and Target Community Detection in Nastaliq Urdu Using Transfer Learning Techniques

被引:0
|
作者
Malik, Muhammad Shahid Iqbal [1 ,2 ]
Nawaz, Aftab [3 ]
Jamjoom, Mona Mamdouh [4 ]
机构
[1] Univ Wah, Dept Comp Sci, Wah Cantt 47040, Pakistan
[2] Natl Res Univ Higher Sch Econ, Dept Comp Sci, Moscow 109028, Russia
[3] COMSATS Univ Islamabad, Dept Comp Sci, Attock Campus, Attock 43600, Pakistan
[4] Princess Nourah Bint Abdulrahman Univ, Coll Comp & Informat Sci, Dept Comp Sci, Riyadh 11671, Saudi Arabia
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Hate speech; Social networking (online); Task analysis; Bidirectional control; Transformers; Support vector machines; Encoding; Natural language processing; Nastaliq Urdu; target community; hate speech; DistilBERT; fine-tunning; Facebook; LANGUAGE;
D O I
10.1109/ACCESS.2024.3444188
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Freedom of expression on social media has provided oppressed people with many opportunities to raise their voices against violence and injustice, but this freedom is being misused to spread various forms of hate speech. Several studies have been conducted to identify hate speech in high-resource languages, however, work on under-resource languages is very limited, especially for Nastaliq Urdu. Pakistan has been dealing with the issue of hateful and violence incitation content for the last two decades. Therefore, this study handled the problem of detecting hate speech and fine-grained multi-class target community identification in Nastaliq Urdu. Using the transfer learning paradigm, two benchmark Urdu transformer models are explored with fine-tuning. A Nastaliq Urdu Hate Speech and Target Community (HSTC) corpus is designed by collecting posts from Pakistani Facebook accounts. In particular, the strengths of the Urdu Robustly Optimized BERT Pre-Training Approach (Urdu-RoBERTa) and Urdu Distillated Bidirectional Encoder Representations from Transformers (Urdu-DistilBERT) are explored to design an automated system instead of hand-crafted features. The proposed framework consists of four steps: 1) data cleaning and preprocessing; 2) data transformation; 3) utilization of Grid search for fine-tuning process; and 4) classification (binary and multi-class). The results on the Nastaliq Urdu corpus showed that the proposed system achieved benchmark performance for binary classification task (hate speech) and target community detection (multi-class classification) on hateful Facebook posts. In particular, fine-tuned DistilBERT achieved 86.58% accuracy and 86.52% f1-score for binary classification and outperformed sixteen baselines. Furthermore, it demonstrated 84.17% accuracy and 83.91% f1-score for target community (religious, political, and gender-based) identification and outperformed all baselines. The findings of this study can be beneficial in detecting and filtering out hate speech in Nastaliq Urdu on the Facebook platform.
引用
收藏
页码:116875 / 116890
页数:16
相关论文
共 50 条
  • [1] UHated: hate speech detection in Urdu language using transfer learning
    Muhammad Umair Arshad
    Raza Ali
    Mirza Omer Beg
    Waseem Shahzad
    [J]. Language Resources and Evaluation, 2023, 57 : 713 - 732
  • [2] UHated: hate speech detection in Urdu language using transfer learning
    Arshad, Muhammad Umair
    Ali, Raza
    Beg, Mirza Omer
    Shahzad, Waseem
    [J]. LANGUAGE RESOURCES AND EVALUATION, 2023, 57 (02) : 713 - 732
  • [3] Hate Speech Detection in Roman Urdu
    Khan, Muhammad Moin
    Shahzad, Khurram
    Malik, Muhammad Kamran
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2021, 20 (01)
  • [4] Hate speech detection on Twitter using transfer learning
    Ali, Raza
    Farooq, Umar
    Arshad, Umair
    Shahzad, Waseem
    Beg, Mirza Omer
    [J]. COMPUTER SPEECH AND LANGUAGE, 2022, 74
  • [5] Improving Hate Speech Detection of Urdu Tweets Using Sentiment Analysis
    Ali, Muhammad Z.
    Ehsan-Ul-Haq
    Rauf, Sahar
    Javed, Kashif
    Hussain, Sarmad
    [J]. IEEE ACCESS, 2021, 9 : 84296 - 84305
  • [6] Transfer learning for hate speech detection in social media
    Yuan, Lanqin
    Wang, Tianyu
    Ferraro, Gabriela
    Suominen, Hanna
    Rizoiu, Marian-Andrei
    [J]. JOURNAL OF COMPUTATIONAL SOCIAL SCIENCE, 2023, 6 (02): : 1081 - 1101
  • [7] Transfer learning for hate speech detection in social media
    Lanqin Yuan
    Tianyu Wang
    Gabriela Ferraro
    Hanna Suominen
    Marian-Andrei Rizoiu
    [J]. Journal of Computational Social Science, 2023, 6 : 1081 - 1101
  • [8] Urdu Nastaliq recognition using convolutional-recursive deep learning
    Naz, Saeeda
    Umar, Arif I.
    Ahmad, Riaz
    Siddiqi, Imran
    Ahmed, Saad B.
    Razzak, Muhammad I.
    Shafait, Faisal
    [J]. NEUROCOMPUTING, 2017, 243 : 80 - 87
  • [9] AngryBERT: Joint Learning Target and Emotion for Hate Speech Detection
    Awal, Md Rabiul
    Cao, Rui
    Lee, Roy Ka-Wei
    Mitrovic, Sandra
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2021, PT I, 2021, 12712 : 701 - 713
  • [10] Detecting Hate Speech using Deep Learning Techniques
    Paul, Chayan
    Bora, Pronami
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (02) : 619 - 623