Creating sentiment lexicon for sentiment analysis in Urdu: The case of a resource-poor language

被引:35
|
作者
Asghar, Muhammad Zubair [1 ]
Sattar, Anum [1 ]
Khan, Aurangzeb [2 ]
Ali, Amjad [3 ]
Kundi, Fazal Masud [1 ]
Ahmad, Shakeel [4 ]
机构
[1] Gomal Univ, ICIT, Dera Ismail Khan, KP, Pakistan
[2] Univ Sci & Technol, Dept Comp Sci, Bannu, Pakistan
[3] Univ Swat, Dept Comp & Software Technol, Saidu Sharif, Pakistan
[4] King Abdul Aziz Univ KAU, FCITR, Jeddah, Saudi Arabia
关键词
polarity lexicon; sentiment analysis; Urdu sentiment lexicon; Urdu SentiWordNet; FRAMEWORK;
D O I
10.1111/exsy.12397
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The sentiment analysis (SA) applications are becoming popular among the individuals and organizations for gathering and analysing user's sentiments about products, services, policies, and current affairs. Due to the availability of a wide range of English lexical resources, such as part-of-speech taggers, parsers, and polarity lexicons, development of sophisticated SA applications for the English language has attracted many researchers. Although there have been efforts for creating polarity lexicons in non-English languages such as Urdu, they suffer from many deficiencies, such as lack of publically available sentiment lexicons with a proper scoring mechanism of opinion words and modifiers. In this work, we present a word-level translation scheme for creating a first comprehensive Urdu polarity resource: "Urdu Lexicon" using a merger of existing resources: list of English opinion words, SentiWordNet, English-Urdu bilingual dictionary, and a collection of Urdu modifiers. We assign two polarity scores, positive and negative, to each Urdu opinion word. Moreover, modifiers are collected, classified, and tagged with proper polarity scores. We also perform an extrinsic evaluation in terms of subjectivity detection and sentiment classification, and the evaluation results show that the polarity scores assigned by this technique are more accurate than the baseline methods.
引用
收藏
页数:19
相关论文
共 50 条
  • [31] Expanding the Lexicon for a Resource-Poor Language Using a Morphological Analyzer and a Web Crawler
    Gasser, Michael
    LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010,
  • [32] Sinhala Sentiment Analysis using Corpus based Sentiment Lexicon
    Chathuranga, P. D. T.
    Lorensuhewa, S. A. S.
    Kalyani, M. A. L.
    2019 19TH INTERNATIONAL CONFERENCE ON ADVANCES IN ICT FOR EMERGING REGIONS (ICTER - 2019), 2019,
  • [33] Extending persian sentiment lexicon with idiomatic expressions for sentiment analysis
    Kia Dashtipour
    Mandar Gogate
    Alexander Gelbukh
    Amir Hussain
    Social Network Analysis and Mining, 2022, 12
  • [34] A Novel Approach for Emotion Detection and Sentiment Analysis for Low Resource Urdu Language Based on CNN-LSTM
    Ullah, Farhat
    Chen, Xin
    Shah, Syed Bilal Hussain
    Mahfoudh, Saoucene
    Abul Hassan, Muhammad
    Saeed, Nagham
    ELECTRONICS, 2022, 11 (24)
  • [35] CSL: A Combined Spanish Lexicon Resource for Polarity Classification and Sentiment Analysis
    Moreno-Sandoval, Luis G.
    Beltran-Herrera, Paola
    Vargas-Cruz, Jaime A.
    Sanchez-Barriga, Carolina
    Pomares-Quimbaya, Alexandra
    Alvarado-Valencia, Jorge A.
    Garcia-Diaz, Juan C.
    ICEIS: PROCEEDINGS OF THE 19TH INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS - VOL 1, 2017, : 288 - 295
  • [36] Lexicon-based fine-tuning of multilingual language models for low-resource language sentiment analysis
    Dhananjaya, Vinura
    Ranathunga, Surangika
    Jayasena, Sanath
    CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2024, 9 (05) : 1116 - 1125
  • [37] BERT-Based Sentiment Analysis for Low-Resourced Languages: A Case Study of Urdu Language
    Ashraf, Muhammad Rehan
    Jana, Yasmeen
    Umer, Qasim
    Jaffar, M. Arfan
    Chung, Sungwook
    Ramay, Waheed Yousuf
    IEEE ACCESS, 2023, 11 : 110245 - 110259
  • [38] Exploring Twitter News Biases Using Urdu-based Sentiment Lexicon
    Amjad, Kamran
    Ishtiaq, Maria
    Firdous, Samar
    Mehmood, Muhammad Amir
    2017 INTERNATIONAL CONFERENCE ON OPEN SOURCE SYSTEMS & TECHNOLOGIES (ICOSST), 2017, : 48 - 53
  • [39] Developing Lexicon-based Algorithms and Sentiment Lexicon for Sentiment Analysis of Saudi Dialect Tweets
    Al-Ghaith, Waleed
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2019, 10 (11) : 83 - 88
  • [40] A New Sentiment Analysis Model for Mixed Language using Contextual Lexicon
    Mahadzir, Nurul Husna
    Razak, Nor Hafizah Abdul
    Omar, Mohd Faizal Mohd
    2020 5TH IEEE INTERNATIONAL CONFERENCE ON RECENT ADVANCES AND INNOVATIONS IN ENGINEERING (IEEE - ICRAIE-2020), 2020,