A Roman Urdu Corpus for sentiment analysis

被引:1
|
作者
Khan, Marwa [1 ]
Naseer, Asma [1 ]
Wali, Aamir [1 ]
Tamoor, Maria [2 ]
机构
[1] Natl Univ Comp & Emerging Sci, FAST Sch Comp, 852-B, Lahore, Pakistan
[2] Forman Christian Coll Univ, Dept Comp Sci, Zahoor Ilahi Rd, Lahore, Pakistan
来源
关键词
D O I
10.1093/comjnl/bxae052
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Sentiment analysis is a dynamic field focused on understanding and predicting emotional sentiments in text or images. With the prevalence of smartphones, e-commerce and social networks, individuals readily express opinions, aiding businesses, political analysts and organizations in decision-making. Despite extensive research in sentiment analysis for various languages, challenges persist in low-resource languages like Roman Urdu. Roman Urdu, the use of Roman script to write Urdu, has gained popularity, yet limited linguistic resources hinder sentiment analysis research. This study addresses this gap by developing a bidirectional long short-term memory network with FastText embeddings and additional layers. A large Roman Urdu corpus for sentiment analysis, consisting of over 51 000 reviews, is crated and the proposed model is trained and compared with 14 other models, demonstrating an accuracy of 0.854 and an F1-score of 0.84.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Sentiment Analysis for Roman Urdu
    Rafique, Ayesha
    Malik, Muhammad Kamran
    Nawaz, Zubair
    Bukhari, Faisal
    Jalbani, Akhtar Hussain
    [J]. MEHRAN UNIVERSITY RESEARCH JOURNAL OF ENGINEERING AND TECHNOLOGY, 2019, 38 (02) : 463 - 470
  • [2] Sentiment Analysis System for Roman Urdu
    Mehmood, Khawar
    Essam, Daryl
    Shafi, Kamran
    [J]. INTELLIGENT COMPUTING, VOL 1, 2019, 858 : 29 - 42
  • [3] RUSAS: Roman Urdu Sentiment Analysis System
    Jawad, Kazim
    Ahmad, Muhammad
    Alvi, Majdah
    Alvi, Muhammad Bux
    [J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 79 (01): : 1463 - 1480
  • [4] A Review of Urdu Sentiment Analysis with Multilingual Perspective: A Case of Urdu and Roman Urdu Language
    Khan, Ihsan Ullah
    Khan, Aurangzeb
    Khan, Wahab
    Su'ud, Mazliham Mohd
    Alam, Muhammad Mansoor
    Subhan, Fazli
    Asghar, Muhammad Zubair
    [J]. COMPUTERS, 2022, 11 (01)
  • [5] Roman Urdu Sentiment Analysis Using Transfer Learning
    Li, Dun
    Ahmed, Kanwal
    Zheng, Zhiyun
    Mohsan, Syed Agha Hassnain
    Alsharif, Mohammed H.
    Hadjouni, Myriam
    Jamjoom, Mona M.
    Mostafa, Samih M.
    [J]. APPLIED SCIENCES-BASEL, 2022, 12 (20):
  • [6] An Unsupervised Approach for Sentiment Analysis on Social Media Short Text Classification in Roman Urdu Sentiment analysis on short text classification in Roman Urdu
    Rana, Toqir A.
    Shahzadi, Kiran
    Rana, Tauseef
    Arshad, Ahsan
    Tubishat, Mohammad
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2022, 21 (02)
  • [7] Roman-Urdu-Parl: Roman-Urdu and Urdu Parallel Corpus for Urdu Language Understanding
    Alam, Mehreen
    Ul Hussain, Sibt
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2022, 21 (01)
  • [8] An unsupervised lexical normalization for Roman Hindi and Urdu sentiment analysis
    Mehmood, Khawar
    Essam, Daryl
    Shafi, Kamran
    Malik, Muhammad Kamran
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2020, 57 (06)
  • [9] Sentiment Analysis for a Resource Poor Language-Roman Urdu
    Mehmood, Khawar
    Essam, Daryl
    Shafi, Kamran
    Malik, Muhammad Kamran
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2020, 19 (01)
  • [10] Discriminative Feature Spamming Technique for Roman Urdu Sentiment Analysis
    Mehmood, Khawar
    Essam, Daryl
    Shafi, Kamran
    Malik, Muhammad Kamran
    [J]. IEEE ACCESS, 2019, 7 : 47991 - 48002