Utilizing Ensemble, Data Sampling and Feature Selection Techniques for Improving Classification Performance on Tweet Sentiment Data

被引:5
|
作者
Prusa, Joseph [1 ]
Khoshgoftaar, Taghi M. [1 ]
Napolitano, Amri [1 ]
机构
[1] Florida Atlantic Univ, Boca Raton, FL 33431 USA
基金
美国国家科学基金会;
关键词
Sentiment Analysis; Tweet Mining; Classification; Bagging; Boosting; Random Undersampling; Feature Selection;
D O I
10.1109/ICMLA.2015.21
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Sentiment analysis of tweets is a popular method of opinion mining social media. Many machine learning techniques exist that can improve the performance of classifiers trained to determine the sentiment or emotional polarity of a tweet; however, they are designed with different objectives and it is unclear which techniques are most beneficial. Additionally, these techniques may behave differently depending on quality of data issues, such as class imbalance, a common problem when using real world data. In an effort to determine which techniques are more important, we tested 12 techniques consisting of: eight feature selection techniques, bagging, boosting and data sampling with two post sampling class ratios. Using five base learners, we compare these techniques against each other and each base learners with no additional technique. We train and test each classifier on a balanced dataset and two imbalanced datasets with different class ratios. Additionally, we conduct statistical tests to determine if the differences observed between techniques are significant. Our results show that bagging and seven of the eight feature selection techniques significantly improve performance (compared to using no technique) on all three datasets, while boosting and data sampling are less beneficial for imbalanced tweet sentiment data. To the best of our knowledge, this is the first study comparing these three types of techniques on tweet sentiment data and the first to show that feature selection and ensemble techniques perform better than data sampling on tweet sentiment data.
引用
收藏
页码:535 / 542
页数:8
相关论文
共 50 条
  • [1] Using Feature Selection in Combination with Ensemble Learning Techniques to Improve Tweet Sentiment Classification Performance
    Prusa, Joseph D.
    Khoshgoftaar, Taghi M.
    Napolitano, Amri
    [J]. 2015 IEEE 27TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2015), 2015, : 186 - 193
  • [2] Using Ensemble Learners to Improve Classifier Performance on Tweet Sentiment Data
    Prusa, Joseph
    Khoshgoftaar, Taghi M.
    Dittman, Daivd J.
    [J]. 2015 IEEE 16TH INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION, 2015, : 252 - 257
  • [3] Cancer Classification Utilizing Voting Classifier with Ensemble Feature Selection Method and Transcriptomic Data
    Khatun, Rabea
    Akter, Maksuda
    Islam, Md. Manowarul
    Uddin, Md. Ashraf
    Talukder, Md. Alamin
    Kamruzzaman, Joarder
    Azad, Akm
    Paul, Bikash Kumar
    Almoyad, Muhammad Ali Abdulllah
    Aryal, Sunil
    Moni, Mohammad Ali
    [J]. GENES, 2023, 14 (09)
  • [4] Novel feature selection approaches for improving the performance of sentiment classification
    Chang, Jing-Rong
    Liang, Hsin-Ying
    Chen, Long-Sheng
    Chang, Chia-Wei
    [J]. JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2020,
  • [5] Sentiment Classification Using Feature Selection Techniques for Text Data Composed of Heterogeneous Sources
    Arya, Vaishali
    Agrawal, Rashmi
    [J]. Recent Advances in Computer Science and Communications, 2022, 15 (02) : 207 - 214
  • [6] Improving performance of classification on incomplete data using feature selection and clustering
    Cao Truong Tran
    Zhang, Mengjie
    Andreae, Peter
    Xue, Bing
    Lam Thu Bui
    [J]. APPLIED SOFT COMPUTING, 2018, 73 : 848 - 861
  • [7] Feature Selection Ensemble for Symbolic Data Classification with AHP
    Wang, Meiqian
    Yue, Xiaodong
    Gao, Can
    Chen, Yufei
    [J]. 2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2018, : 868 - 873
  • [8] Liver Disease Classification by Pruning Data Dependency Utilizing Ensemble Learning Based Feature Selection
    Bin Khaled, Md Asif
    Rahman, Md Mahin
    Quaiyum, Md Golam
    Akter, Sumiya
    [J]. AI 2022: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, 13728 : 614 - 627
  • [9] Tweet Sentiment Classification Using an Ensemble of Machine Learning Supervised Classifiers Employing Statistical Feature Selection Methods
    Devi, K. Lakshmi
    Subathra, P.
    Kumar, P. N.
    [J]. PROCEEDINGS OF THE FIFTH INTERNATIONAL CONFERENCE ON FUZZY AND NEURO COMPUTING (FANCCO - 2015), 2015, 415 : 1 - 13
  • [10] Optimizing feature selection techniques for sentiment classification
    Uribe, Diego
    [J]. 2011 IEEE ELECTRONICS, ROBOTICS AND AUTOMOTIVE MECHANICS CONFERENCE (CERMA 2011), 2011, : 103 - 107