A TfidfVectorizer and SVM based sentiment analysis framework for text data corpus

被引:20
|
作者
Kumar, Vipin [1 ]
Subba, Basant [1 ]
机构
[1] Natl Inst Technol Hamirpur, Dept Comp Sci & Engn, Hamirpur 177005, Himachal Prades, India
关键词
Sentiment Analysis; TfidfVectorizer; Support Vector Machine (SVM); Amazon dataset; IMDB dataset;
D O I
10.1109/ncc48643.2020.9056085
中图分类号
TN [电子技术、通信技术];
学科分类号
0809 ;
摘要
E-commerce and social networking sites are very much dependent on the available data which can be analyzed in real time to predict their future business strategies. However, analyzing huge amount of data manually is not possible in time context of business. Therefore, automated sentimental analysis, which can automatically determine the sentiments from the text data corpus plays an important role in today's world. Many sentimental analysis frameworks with state of the art results have been proposed in the literature. However, many of these frameworks have low accuracy on the textual data corpus contains emoticons and special texts. In addition, many of these frameworks are also energy and computation intensive with which puts limitation in their real time deployment. In this paper , we aim to address these issues by proposing a novel sentimental analysis framework based on Support Vector Machine (SVM). The proposed framework uses a novel technique to tokenize the text documents, wherein stop words, special characters, emoticons present in the text documents are eliminated. In addition, words with similar meanings and annotations are clubbed together into one type, using the concept of stemming. The pre-processed tokenized documents are then vectorized into n-gram integers vectors using the 'TfidfVectorizer' for use as input to the SVM based machine learning classifier model. Experimental results on the Amazon's electronics item review dataset and IMDB's movie review data corpus show that the proposed sentimental analysis framework achieves high performance compared to other similar frameworks proposed in the literature.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] Research on text sentiment analysis based on CNNs and SVM
    Chen, Yuling
    Zhang, Zhi
    [J]. PROCEEDINGS OF THE 2018 13TH IEEE CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA 2018), 2018, : 2731 - 2734
  • [2] Framework for Sentiment Analysis of Arabic Text
    Almuqren, Latifah
    Cristea, Alexandra I.
    [J]. PROCEEDINGS OF THE 27TH ACM CONFERENCE ON HYPERTEXT AND SOCIAL MEDIA (HT'16), 2016, : 315 - 317
  • [3] Sentiment recognition and analysis method of official document text based on BERT–SVM model
    Shule Hao
    Peng Zhang
    Sen Liu
    Yuhang Wang
    [J]. Neural Computing and Applications, 2023, 35 : 24621 - 24632
  • [4] Sentiment analysis of Japanese text and vocabulary learning based on natural language processing and SVM
    Song, Gang
    [J]. JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2021,
  • [5] A Novel Approach for Sentiment Analysis of Punjabi Text using SVM
    Kaur, Amandeep
    Gupta, Vishal
    [J]. INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2017, 14 (05) : 707 - 712
  • [6] TEXT BASED SENTIMENT ANALYSIS
    Nandi, Biswarup
    Ghanti, Mousumi
    Paul, Souvik
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INVENTIVE COMPUTING AND INFORMATICS (ICICI 2017), 2017, : 9 - 13
  • [7] Sentiment Miner: A Prototype for Sentiment Analysis of Unstructured Data and Text
    Shahbaz, Muhammad
    Guergachi, Aziz
    Rehman, Rana Tanzeel ur
    [J]. 2014 IEEE 27TH CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (CCECE), 2014,
  • [8] Deriving anti-epidemic policy from public sentiment: A framework based on text analysis with microblog data
    Zhao, Sijia
    Chen, Lixuan
    Liu, Ying
    Yu, Muran
    Han, Han
    [J]. PLOS ONE, 2022, 17 (08):
  • [9] Sentiment recognition and analysis method of official document text based on BERT-SVM model
    Hao, Shule
    Zhang, Peng
    Liu, Sen
    Wang, Yuhang
    [J]. NEURAL COMPUTING & APPLICATIONS, 2023, 35 (35): : 24621 - 24632
  • [10] Sinhala Sentiment Analysis using Corpus based Sentiment Lexicon
    Chathuranga, P. D. T.
    Lorensuhewa, S. A. S.
    Kalyani, M. A. L.
    [J]. 2019 19TH INTERNATIONAL CONFERENCE ON ADVANCES IN ICT FOR EMERGING REGIONS (ICTER - 2019), 2019,