Sentiment Classification Using Feature Selection Techniques for Text Data Composed of Heterogeneous Sources

被引:0
|
作者
Arya V. [1 ]
Agrawal R. [1 ]
机构
[1] Manav Rachna International Institute of Research & Studies, Faridabad
关键词
bag of word; Feature selection; heterogeneous source; machine learning; sentiment classifier; TF-IDF; Word2Vec;
D O I
10.2174/2666255813999200818133555
中图分类号
学科分类号
摘要
Aims: This study analyzes feature selection techniques for text data composed of heterogeneous sources for sentiment classification Objectives: The objective of work is to analyze the feature selection technique for text gathered from different sources to increase the accuracy of sentiment classification done on microblogs. Methods: Three feature selection techniques Bag-of-Word(BOW), TF-IDF, and word2vector were applied to find the most suitable feature selection techniques for heterogeneous datasets. Results: TF-IDF outperforms all of the three selected feature selection techniques for sentiment classification with SVM classifier. Conclusion: Feature selection is an integral part of any data preprocessing task, and along with that, it is also important for the machine learning algorithms to achieve good accuracy in classification results. Hence it is essential to find out the best suitable approach for heterogeneous sources of data. The heterogeneous sources are rich sources of information and they also play an important role in developing a model for adaptable systems as well. So keeping that also in mind, we compared the three techniques for heterogeneous source data and found that TF-IDF is the most suitable one for all types of data, whether it is balanced or imbalanced data, it is a single source or multiple source data. In all cases, the TF-IDF approach is the most promising approach in generating the results for the classification of sentiments of users. © 2022 Bentham Science Publishers.
引用
收藏
页码:207 / 214
页数:7
相关论文
共 50 条
  • [1] Optimizing feature selection techniques for sentiment classification
    Uribe, Diego
    2011 IEEE ELECTRONICS, ROBOTICS AND AUTOMOTIVE MECHANICS CONFERENCE (CERMA 2011), 2011, : 103 - 107
  • [2] Text feature selection for sentiment classification of Chinese online reviews
    Wang, Hongwei
    Yin, Pei
    Yao, Jiani
    Liu, James N. K.
    JOURNAL OF EXPERIMENTAL & THEORETICAL ARTIFICIAL INTELLIGENCE, 2013, 25 (04) : 425 - 439
  • [3] A hybrid method of feature selection for Chinese text sentiment classification
    Wang, Suge
    Wei, Yingjie
    Li, Deyu
    Zhang, Wu
    Li, Wei
    FOURTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 3, PROCEEDINGS, 2007, : 435 - +
  • [4] The effects of globalisation techniques on feature selection for text classification
    Parlak, Bekir
    Uysal, Alper Kursat
    JOURNAL OF INFORMATION SCIENCE, 2021, 47 (06) : 727 - 739
  • [5] Feature selection and machine learning algorithms for uyghur text sentiment classification
    Turhuntay, Raxida
    Slamu, Wushour
    Dawut, Abdusalam
    Hamdulla, Askar
    Turhun, Erxat
    Boletin Tecnico/Technical Bulletin, 2017, 55 (13): : 56 - 66
  • [6] Utilizing Ensemble, Data Sampling and Feature Selection Techniques for Improving Classification Performance on Tweet Sentiment Data
    Prusa, Joseph
    Khoshgoftaar, Taghi M.
    Napolitano, Amri
    2015 IEEE 14TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2015, : 535 - 542
  • [7] Feature Selection for Sentiment Classification Using Matrix Factorization
    Liang, Jiguang
    Zhou, Xiaofei
    Guo, Li
    Bai, Shuo
    WWW'15 COMPANION: PROCEEDINGS OF THE 24TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB, 2015, : 63 - 64
  • [8] Text Classification Using Ensemble Features Selection and Data Mining Techniques
    Shravankumar, B.
    Ravi, Vadlamani
    SWARM, EVOLUTIONARY, AND MEMETIC COMPUTING, SEMCCO 2014, 2015, 8947 : 176 - 186
  • [9] Impact of feature selection techniques in Text Classification: An Experimental study
    Basha, S. Rahamat
    Rani, J. Keziya
    Yadav, J. J. C. Prasad
    Kumar, G. Ravi
    JOURNAL OF MECHANICS OF CONTINUA AND MATHEMATICAL SCIENCES, 2019, : 39 - 51
  • [10] Extensive Survey on Feature Extraction and Feature Selection Techniques for Sentiment Classification in Social Media
    Kumar, S. Sathish
    Rajini, Aruchamy
    2019 10TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND NETWORKING TECHNOLOGIES (ICCCNT), 2019,