Exploiting Linguistic Features for Effective Sentence-Level Sentiment Analysis in Urdu Language

被引:7
|
作者
Altaf, Amna [1 ]
Anwar, Muhammad Waqas [1 ]
Jamal, Muhammad Hasan [1 ]
Bajwa, Usama Ijaz [1 ]
机构
[1] COMSATS Univ Islamabad, Dept Comp Sci, Lahore Campus 1-5 Km Def Rd Raiwind Rd, Lahore, Punjab, Pakistan
关键词
Supervised Machine Learning; Parts of Speech Tagging; Sentiment Analysis; Urdu Language; SELECTION;
D O I
10.1007/s11042-023-15216-0
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Rapid increase in the use of social media has led to the generation of gigabytes of information shared by billions of users worldwide. To analyze this information and determine the behavior of people towards different events, sentiment analysis is widely used by researchers. Existing studies in Urdu sentiment analysis mostly use traditional n-gram features, which unlike linguistic features, do not focus on the contextual information being discussed. Moreover, no existing study classifies sentiments of proverbs and idioms which is challenging as mostly they do not contain sentiment words but carry strong sentiments. This study exploits linguistic features of Urdu language for sentence-level sentiment analysis and classifies idioms and proverbs using classical machine learning techniques. We develop a dataset comprising of idioms, proverbs, and sentences from the news domain, and extract part-of-speech tag-based features, boolean features, and numeric features from the dataset after keen linguistic analysis of Urdu language. Experimental results show that J48 classifier performs best in sentiment classification with an accuracy of 90% and an F-measure of 88%.
引用
收藏
页码:41813 / 41839
页数:27
相关论文
共 50 条