A TfidfVectorizer and SVM based sentiment analysis framework for text data corpus

被引:20
|
作者
Kumar, Vipin [1 ]
Subba, Basant [1 ]
机构
[1] Natl Inst Technol Hamirpur, Dept Comp Sci & Engn, Hamirpur 177005, Himachal Prades, India
关键词
Sentiment Analysis; TfidfVectorizer; Support Vector Machine (SVM); Amazon dataset; IMDB dataset;
D O I
10.1109/ncc48643.2020.9056085
中图分类号
TN [电子技术、通信技术];
学科分类号
0809 ;
摘要
E-commerce and social networking sites are very much dependent on the available data which can be analyzed in real time to predict their future business strategies. However, analyzing huge amount of data manually is not possible in time context of business. Therefore, automated sentimental analysis, which can automatically determine the sentiments from the text data corpus plays an important role in today's world. Many sentimental analysis frameworks with state of the art results have been proposed in the literature. However, many of these frameworks have low accuracy on the textual data corpus contains emoticons and special texts. In addition, many of these frameworks are also energy and computation intensive with which puts limitation in their real time deployment. In this paper , we aim to address these issues by proposing a novel sentimental analysis framework based on Support Vector Machine (SVM). The proposed framework uses a novel technique to tokenize the text documents, wherein stop words, special characters, emoticons present in the text documents are eliminated. In addition, words with similar meanings and annotations are clubbed together into one type, using the concept of stemming. The pre-processed tokenized documents are then vectorized into n-gram integers vectors using the 'TfidfVectorizer' for use as input to the SVM based machine learning classifier model. Experimental results on the Amazon's electronics item review dataset and IMDB's movie review data corpus show that the proposed sentimental analysis framework achieves high performance compared to other similar frameworks proposed in the literature.
引用
收藏
页数:6
相关论文
共 50 条
  • [41] Evolving dictionary based sentiment scoring framework for patient authored text
    Kumar, Chitturi Satya Pavan
    Babu, Lekkala Dasaratha Dhinesh
    [J]. EVOLUTIONARY INTELLIGENCE, 2021, 14 (02) : 657 - 667
  • [42] Evolving dictionary based sentiment scoring framework for patient authored text
    Chitturi Satya Pavan Kumar
    Lekkala Dasaratha Dhinesh Babu
    [J]. Evolutionary Intelligence, 2021, 14 : 657 - 667
  • [43] A SVM-Based Method for Sentiment Analysis in Persian Language
    Hajmohammadi, Mohammad Sadegh
    Ibrahim, Roliana
    [J]. INTERNATIONAL CONFERENCE ON GRAPHIC AND IMAGE PROCESSING (ICGIP 2012), 2013, 8768
  • [44] Sentiment Analysis for Sarcasm Detection on Streaming Short Text Data
    Prasad, Anukarsh G.
    Sanjana, S.
    Bhat, Skanda M.
    Harish, B. S.
    [J]. PROCEEDINGS OF 2017 2ND INTERNATIONAL CONFERENCE ON KNOWLEDGE ENGINEERING AND APPLICATIONS (ICKEA), 2017, : 1 - 5
  • [45] Arabic Sentiment Analysis: Lexicon-based and Corpus-based
    Abdulla, Nawaf A.
    Ahmed, Nizar A.
    Shehab, Mohammed A.
    Al-Ayyoub, Mahmoud
    [J]. 2013 IEEE JORDAN CONFERENCE ON APPLIED ELECTRICAL ENGINEERING AND COMPUTING TECHNOLOGIES (AEECT), 2013,
  • [46] SENTIMENT ANALYSIS OF MICROBLOG TEXT BASED ON JOINT SENTIMENT-TOPIC MODEL
    Zhang, Hui
    Liu, Yiqun
    Ma, Shaoping
    [J]. 2014 IEEE 3RD INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (CCIS), 2014, : 46 - 54
  • [47] Comparison of Text Sentiment Analysis based on Machine Learning
    Zhang, Xueying
    Zheng, Xianghan
    [J]. 2016 15TH INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED COMPUTING (ISPDC), 2016, : 230 - 233
  • [48] Integrated Framework for Keyword-based Text Data Collection and Analysis
    Cha, Minki
    Kwon, Jung-Hyok
    Lee, Sol-Bee
    Park, Jaehoon
    Youm, Sungkwan
    Kim, Eui-Jik
    [J]. SENSORS AND MATERIALS, 2018, 30 (03) : 439 - 445
  • [49] Text sentiment analysis Based on Depth Learning Model
    Zheng, Wenfei
    [J]. 2018 INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, BIG DATA AND BLOCKCHAIN (ICCBB 2018), 2018, : 89 - 91
  • [50] Slang-Based Text Sentiment Analysis in Instagram
    Aly, Elton Shah
    van der Haar, Dustin Terence
    [J]. FOURTH INTERNATIONAL CONGRESS ON INFORMATION AND COMMUNICATION TECHNOLOGY, VOL 2, 2020, 1027 : 321 - 329