Sarcasm identification in textual data: systematic review, research challenges and open directions

被引:43
|
作者
Eke, Christopher Ifeanyi [1 ,2 ]
Norman, Azah Anir [1 ]
Shuib, Liyana [1 ]
Nweke, Henry Friday [1 ,3 ]
机构
[1] Univ Malaya, Dept Informat Syst, Fac Comp Sci & Informat Technol, Kuala Lumpur 50603, Malaysia
[2] Fed Univ, Fac Sci, Dept Comp Sci, PMB 146, Lafia, Nasarawa State, Nigeria
[3] Ebonyi State Univ, Comp Sci Dept, PMB 053, Abakaliki, Ebonyi State, Nigeria
关键词
Sarcasm identification; Social media data; Natural language processing; Pre-processing; Feature engineering; Textual classification; Performance measure; SOCIAL MEDIA; CLASSIFICATION; SELECTION; TWEETS;
D O I
10.1007/s10462-019-09791-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Sarcasm is a form of sentiment whereby people express the implicit information, usually the opposite of the message content in order to hurt someone emotionally or criticise something in a humorous way. Sarcasm identification in textual data, being one of the hardest challenges in natural language processing (NLP), has recently become an interesting research area due to its importance in improving the sentiment analysis of social media data. A few studies have carried out a comprehensive literature review on sarcasm identification in the existing primary study within the last 11 years. Thus, this study carried out a review on the classification techniques for sarcasm identification under the aspects of datasets, pre-processing, feature engineering, classification algorithms, and performance metrics. The study has considered the published article from the period of 2008 to 2019. Forty (40) academic literature were selected from the 7 standard academic databases in order to carry out the review and realize the objectives. The study revealed that most researchers created their own datasets since there is no standard available datasets in the domain of sarcasm identification. Context and content-based linguistic features were used in most of the studies. This review shows that n-gram and parts of speech tagging techniques were the most commonly used feature extraction techniques. However, binary representation and term frequency were utilized for feature representation whereas Chi squared and information gain were used for the feature selection scheme. Moreover, classification algorithm such as support vector machine, Naive Bayes, random forest, maximum entropy, and decision tree algorithm were mostly applied using accuracy, precision, recall and F-measure for performance measures. Finally, research challenges and future direction are summarized in this review. This review reveals the impact of sarcasm identification in building effective product reviews and would serve as handle resources for researchers and practitioners in sarcasm identification and text classification in general.
引用
收藏
页码:4215 / 4258
页数:44
相关论文
共 50 条
  • [31] A key review on security and privacy of big data: issues, challenges, and future research directions
    Doygun Demirol
    Resul Das
    Davut Hanbay
    [J]. Signal, Image and Video Processing, 2023, 17 : 1335 - 1343
  • [32] Methodological and ethical challenges in designing and conducting research at the end of life: A systematic review of qualitative and textual evidence
    Vlckova, Karolina
    Gonella, Silvia
    Bavelaar, Laura
    Mitchell, Gary
    Sussman, Tamara
    [J]. INTERNATIONAL JOURNAL OF NURSING PRACTICE, 2023,
  • [33] A key review on security and privacy of big data: issues, challenges, and future research directions
    Demiroll, Doygun
    Das, Resul
    Hanbay, Davut
    [J]. SIGNAL IMAGE AND VIDEO PROCESSING, 2023, 17 (04) : 1335 - 1343
  • [34] Paternity leave: A systematic review and directions for research
    Pizarro, Jon
    Gartzia, Leire
    [J]. HUMAN RESOURCE MANAGEMENT REVIEW, 2024, 34 (01)
  • [35] Clinicians and accounting: A systematic review and research directions
    Oppi, Chiara
    Cannpanale, Cristina
    Cinquini, Lino
    Vagnoni, Emidia
    [J]. FINANCIAL ACCOUNTABILITY & MANAGEMENT, 2019, 35 (03) : 290 - 312
  • [36] Consumer confusion: a systematic review and research directions
    Chauhan, Vishakha
    Sagar, Mahim
    [J]. JOURNAL OF CONSUMER MARKETING, 2021, 38 (04) : 445 - 456
  • [37] Comprehensive Review of Multimodal Medical Data Analysis: Open Issues and Future Research Directions
    Shetty, Shashank
    Ananthanarayana, V. S.
    Mahale, Ajit
    [J]. ACTA INFORMATICA PRAGENSIA, 2022, 11 (03) : 423 - 457
  • [38] Social Exergames in Health and Wellness: A Systematic Review of Trends, Effectiveness, Challenges, and Directions for Future Research
    Chan, Gerry
    Banire, Bilikis
    Anukem, Sussan
    Imran, Masud
    Meena, Suraj
    Nwagu, Chukwuemeka
    Oyebode, Oladapo
    Alslaity, Alaa
    Arya, Ali
    Orji, Rita
    [J]. INTERNATIONAL JOURNAL OF HUMAN-COMPUTER INTERACTION, 2024,
  • [39] Systematic Literature Review of Information Extraction From Textual Data: Recent Methods, Applications, Trends, and Challenges
    Abdullah, Mohd Hafizul Afifi
    Aziz, Norshakirah
    Abdulkadir, Said Jadid
    Alhussian, Hitham Seddig Alhassan
    Talpur, Noureen
    [J]. IEEE ACCESS, 2023, 11 : 10535 - 10562
  • [40] Learning from imbalanced data: open challenges and future directions
    Krawczyk B.
    [J]. Krawczyk, Bartosz (bartosz.krawczyk@pwr.edu.pl), 1600, Springer Verlag (05): : 221 - 232