Multi-class sentiment classification: The experimental comparisons of feature selection and machine learning algorithms

被引:129
|
作者
Liu, Yang [1 ]
Bi, Jian-Wu [1 ]
Fan, Zhi-Ping [1 ,2 ]
机构
[1] Northeastern Univ, Sch Business Adm, Dept Management Sci & Engn, Shenyang 110167, Peoples R China
[2] Northeastern Univ, State Key Lab Synthet Automat Proc Ind, Shenyang 110819, Peoples R China
基金
美国国家科学基金会;
关键词
Multi-class sentiment classification; Experimental comparison; Feature selection algorithms; Machine learning algorithms; STRENGTH DETECTION; INFORMATION; REVIEWS;
D O I
10.1016/j.eswa.2017.03.042
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-class sentiment classification has extensive application backgrounds, whereas studies on this issue are still relatively scarce. In this paper, a framework for multi-class sentiment classification is proposed, which includes two parts: 1) selecting important features of texts using the feature selection algorithm, and 2) training multi-class sentiment classifier using the machine learning algorithm. Then, experiments are conducted for comparing the performances of four popular feature selection algorithms (document frequency, CHI statistics, information gain and gain ratio) and five popular machine learning algorithms (decision tree, naive Bayes, support vector machine, radial basis function neural network and K-nearest neighbor) in multi-class sentiment classification. The experiments are conducted on three public datasets which include twelve data subsets, and 10-fold cross validation is used to obtain the classification accuracy concerning each combination of feature selection algorithm, machine learning algorithm, feature set size and data subset. Based on the obtained 3600 classification accuracies (4 feature selection algorithms x 5 machine learning algorithms x 15 feature set sizes x 12 data subsets), the average classification accuracy of each algorithm is calculated, and the Wilcoxon test is used to verify the existence of significant difference between different algorithms in multi-class sentiment classification. The results show that, in terms of classification accuracy, gain ratio performs best among the four feature selection algorithms and support vector machine performs best among the five machine learning algorithms. In terms of execution time, the similar comparisons are also conducted. The obtained results would be valuable for further improving the existing multi-class sentiment classifiers and developing new multi-class sentiment classifiers. (C) 2017 Elsevier Ltd. All rights reserved.
引用
收藏
页码:323 / 339
页数:17
相关论文
共 50 条
  • [1] Machine learning with automatic feature selection for multi-class protein fold classification
    Huang, CD
    Liang, SF
    Lin, CT
    Wu, RC
    [J]. JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2005, 21 (04) : 711 - 720
  • [2] Extreme Learning Machine for Multi-class Sentiment Classification of Tweets
    Wang, Zhaoxia
    Parth, Yogesh
    [J]. PROCEEDINGS OF ELM-2015, VOL 1: THEORY, ALGORITHMS AND APPLICATIONS (I), 2016, 6 : 1 - 11
  • [3] Classification and feature selection algorithms for multi-class CGH data
    Liu, Jun
    Ranka, Sanjay
    Kahveci, Tamer
    [J]. BIOINFORMATICS, 2008, 24 (13) : I86 - I95
  • [4] Feature selection and machine learning algorithms for uyghur text sentiment classification
    Turhuntay, Raxida
    Slamu, Wushour
    Dawut, Abdusalam
    Hamdulla, Askar
    Turhun, Erxat
    [J]. Boletin Tecnico/Technical Bulletin, 2017, 55 (13): : 56 - 66
  • [5] Study on Feature Selection and Machine Learning Algorithms For Malay Sentiment Classification
    Alsaffar, Ahmed
    Omar, Nazlia
    [J]. PROCEEDINGS OF THE 2014 6TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND MULTIMEDIA (ICIM), 2014, : 270 - 275
  • [6] Efficient Algorithms for Feature Selection in Multi-class Support Vector Machine
    Hoai An Le Thi
    Manh Cuong Nguyen
    [J]. ADVANCED COMPUTATIONAL METHODS FOR KNOWLEDGE ENGINEERING, 2013, 479 : 41 - 52
  • [7] Multi-Class Sentiment Analysis of Social Media Data with Machine Learning Algorithms
    Mutanov, Galimkair
    Karyukin, Vladislav
    Mamykova, Zhanl
    [J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2021, 69 (01): : 913 - 930
  • [8] Multi-class feature selection for texture classification
    Chen, Xue-wen
    Zeng, Xiangyan
    van Alphen, Deborah
    [J]. PATTERN RECOGNITION LETTERS, 2006, 27 (14) : 1685 - 1691
  • [9] A Comparative Study of Feature Selection and Machine Learning Algorithms for Arabic Sentiment Classification
    Omar, Nazlia
    Albared, Mohammed
    Al-Moslmi, Tareq
    Al-Shabi, Adel
    [J]. INFORMATION RETRIEVAL TECHNOLOGY, AIRS 2014, 2014, 8870 : 429 - 443
  • [10] A comparative study of feature selection and machine learning algorithms for arabic sentiment classification
    Omar, Nazlia
    Albared, Mohammed
    Al-Moslmi, Tareq
    Al-Shabi, Adel
    [J]. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2014, 8870 : 429 - 443