Identification of Sarcasm in Textual Data: A Comparative Study

被引:0
|
作者
Pulkit Mehndiratta [1 ]
Devpriya Soni [1 ]
机构
[1] Jaypee Institute of Information Technology
关键词
Machine learning; Artificial neural networks; Word embedding; Text vectorization; Accuracy;
D O I
暂无
中图分类号
TP391.1 [文字信息处理]; TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 081203 ; 0835 ; 1405 ;
摘要
Purpose: Ever increasing penetration of the Internet in our lives has led to an enormous amount of multimedia content generation on the internet. Textual data contributes a major share towards data generated on the world wide web. Understanding people’s sentiment is an important aspect of natural language processing, but this opinion can be biased and incorrect, if people use sarcasm while commenting, posting status updates or reviewing any product or a movie. Thus, it is of utmost importance to detect sarcasm correctly and make a correct prediction about the people’s intentions.Design/methodology/approach: This study tries to evaluate various machine learning models along with standard and hybrid deep learning models across various standardized datasets. We have performed vectorization of text using word embedding techniques. This has been done to convert the textual data into vectors for analytical purposes. We have used three standardized datasets available in public domain and used three word embeddings i.e Word2 Vec, GloVe and fastText to validate the hypothesis.Findings: The results were analyzed and conclusions are drawn. The key finding is: the hybrid models that include Bidirectional LongTerm Short Memory(Bi-LSTM) and Convolutional Neural Network(CNN) outperform others conventional machine learning as well as deep learning models across all the datasets considered in this study, making our hypothesis valid.Research limitations: Using the data from different sources and customizing the models according to each dataset, slightly decreases the usability of the technique. But, overall this methodology provides effective measures to identify the presence of sarcasm with a minimum average accuracy of 80% or above for one dataset and better than the current baseline results for the other datasets.Practical implications: The results provide solid insights for the system developers to integrate this model into real-time analysis of any review or comment posted in the public domain. This study has various other practical implications for businesses that depend on user ratings and public opinions. This study also provides a launching platform for various researchers to work on the problem of sarcasm identification in textual data.Originality/value: This is a first of its kind study, to provide us the difference between conventional and the hybrid methods of prediction of sarcasm in textual data. The study also provides possible indicators that hybrid models are better when applied to textual data for analysis of sarcasm.
引用
收藏
页码:56 / 83
页数:28
相关论文
共 50 条
  • [1] Identification of Sarcasm in Textual Data: A Comparative Study
    Mehndiratta, Pulkit
    Soni, Devpriya
    [J]. JOURNAL OF DATA AND INFORMATION SCIENCE, 2019, 4 (04) : 56 - 83
  • [2] Identification of Sarcasm in Textual Data: A Comparative Study
    Pulkit Mehndiratta
    Devpriya Soni
    [J]. Journal of Data and Information Science., 2019, 4 (04) - 83
  • [3] A Comprehensive Study of Classification Techniques for Sarcasm Detection on Textual Data
    Dave, Anandkumar D.
    Desai, Nikita P.
    [J]. 2016 INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS, AND OPTIMIZATION TECHNIQUES (ICEEOT), 2016, : 1985 - 1991
  • [4] Sarcasm identification in textual data: systematic review, research challenges and open directions
    Christopher Ifeanyi Eke
    Azah Anir Norman
    Henry Friday Liyana Shuib
    [J]. Artificial Intelligence Review, 2020, 53 : 4215 - 4258
  • [5] Sarcasm identification in textual data: systematic review, research challenges and open directions
    Eke, Christopher Ifeanyi
    Norman, Azah Anir
    Shuib, Liyana
    Nweke, Henry Friday
    [J]. ARTIFICIAL INTELLIGENCE REVIEW, 2020, 53 (06) : 4215 - 4258
  • [6] Sarcasm Detection of Tweets: A comparative Study
    Jain, Tanya
    Agrawal, Nilesh
    Goyal, Garima
    Aggrawal, Niyati
    [J]. 2017 TENTH INTERNATIONAL CONFERENCE ON CONTEMPORARY COMPUTING (IC3), 2017, : 189 - 194
  • [7] TagPies: Comparative Visualization of Textual Data
    Jaenicke, Stefan
    Blumenstein, Judith
    Ruecker, Michaela
    Zeckzer, Dirk
    Scheuermann, Gerik
    [J]. VISIGRAPP 2018: PROCEEDINGS OF THE 13TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS / INTERNATIONAL CONFERENCE ON INFORMATION VISUALIZATION THEORY AND APPLICATIONS (IVAPP), VOL 3, 2018, : 40 - 51
  • [8] Comparative study on textual data set using fuzzy clustering algorithms
    Sadika, Rjiba
    Soltani, Moez
    Benammou, Saloua
    [J]. KYBERNETES, 2016, 45 (08) : 1232 - 1242
  • [9] How to Detect Novelty in Textual Data Streams? A Comparative Study of Existing Methods
    Christophe, Clement
    Velcin, Julien
    Cugliari, Jairo
    Suignard, Philippe
    Boumghar, Manel
    [J]. ADVANCED ANALYTICS AND LEARNING ON TEMPORAL DATA, AALTD 2019, 2020, 11986 : 110 - 125
  • [10] Identification of nonliteral language in social media: A case study on sarcasm
    Muresan, Smaranda
    Gonzalez-Ibanez, Roberto
    Ghosh, Debanjan
    Wacholder, Nina
    [J]. JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, 2016, 67 (11) : 2725 - 2737