Enhancing machine learning-based sentiment analysis through feature extraction techniques

被引:1
|
作者
Semary, Noura A. [1 ]
Ahmed, Wesam [1 ,2 ]
Amin, Khalid [1 ]
Plawiak, Pawel [3 ,4 ]
Hammad, Mohamed [1 ,5 ]
机构
[1] Menoufia Univ, Fac Comp & Informat, Dept Informat Technol, Shibin Al Kawm, Egypt
[2] South Valley Univ, Fac Comp & Artificial Intelligence, Dept Informat Technol, Hurghada, Egypt
[3] Cracow Univ Technol, Fac Comp Sci & Telecommun, Dept Comp Sci, Krakow, Poland
[4] Inst Theoret & Appl Informat, Polish Acad Sci, Gliwice, Poland
[5] Prince Sultan Univ, Coll Comp & Informat Sci, EIAS Data Sci Lab, Riyadh, Saudi Arabia
来源
PLOS ONE | 2024年 / 19卷 / 02期
关键词
LANGUAGE; SMOTE;
D O I
10.1371/journal.pone.0294968
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
A crucial part of sentiment classification is featuring extraction because it involves extracting valuable information from text data, which affects the model's performance. The goal of this paper is to help in selecting a suitable feature extraction method to enhance the performance of sentiment analysis tasks. In order to provide directions for future machine learning and feature extraction research, it is important to analyze and summarize feature extraction techniques methodically from a machine learning standpoint. There are several methods under consideration, including Bag-of-words (BOW), Word2Vector, N-gram, Term Frequency- Inverse Document Frequency (TF-IDF), Hashing Vectorizer (HV), and Global vector for word representation (GloVe). To prove the ability of each feature extractor, we applied it to the Twitter US airlines and Amazon musical instrument reviews datasets. Finally, we trained a random forest classifier using 70% of the training data and 30% of the testing data, enabling us to evaluate and compare the performance using different metrics. Based on our results, we find that the TD-IDF technique demonstrates superior performance, with an accuracy of 99% in the Amazon reviews dataset and 96% in the Twitter US airlines dataset. This study underscores the paramount significance of feature extraction in sentiment analysis, endowing pragmatic insights to elevate model performance and steer future research pursuits.
引用
收藏
页数:19
相关论文
共 50 条
  • [1] Research on the Sentiment Analysis Based on Machine Learning and Feature Extraction Algorithm
    Jin, Xiaofang
    Xu, Ying
    [J]. PROCEEDINGS OF 2019 IEEE 10TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND SERVICE SCIENCE (ICSESS 2019), 2019, : 386 - 389
  • [2] Machine Learning-Based Feature Extraction and Selection
    Ruano-Ordas, David
    [J]. APPLIED SCIENCES-BASEL, 2024, 14 (15):
  • [3] Machine Learning-Based Sentiment Analysis of Twitter Data
    Karthiga, M.
    Kumar, Sathish G.
    Aravindhraj, N.
    Priyanka, S.
    [J]. PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING & COMMUNICATION ENGINEERING (ICACCE-2019), 2019,
  • [4] Machine Learning-Based Sentiment Analysis for Twitter Accounts
    Hasan, Ali
    Moin, Sana
    Karim, Ahmad
    Shamshirband, Shahaboddin
    [J]. MATHEMATICAL AND COMPUTATIONAL APPLICATIONS, 2018, 23 (01)
  • [5] A Machine Learning-Based Lexicon Approach for Sentiment Analysis
    Sahu, Tirath Prasad
    Khandekar, Sarang
    [J]. INTERNATIONAL JOURNAL OF TECHNOLOGY AND HUMAN INTERACTION, 2020, 16 (02) : 8 - 22
  • [6] A deep learning-based model using hybrid feature extraction approach for consumer sentiment analysis
    Kaur, Gagandeep
    Sharma, Amit
    [J]. JOURNAL OF BIG DATA, 2023, 10 (01)
  • [7] A deep learning-based model using hybrid feature extraction approach for consumer sentiment analysis
    Gagandeep Kaur
    Amit Sharma
    [J]. Journal of Big Data, 10
  • [8] Emerging Feature Extraction Techniques for Machine Learning-Based Classification of Carotid Artery Ultrasound Images
    Latha, S.
    Muthu, P.
    Dhanalakshmi, Samiappan
    Kumar, R.
    Lai, Khin Wee
    Wu, Xiang
    [J]. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
  • [9] Enhancing IoT Botnet Detection through Machine Learning-based Feature Selection and Ensemble Models
    Sharma, Ravi
    Din, Saika Mohi Ud
    Sharma, Nonita
    Kumar, Arun
    [J]. EAI ENDORSED TRANSACTIONS ON SCALABLE INFORMATION SYSTEMS, 2024, 11 (02) : 1 - 6
  • [10] Machine learning-based intrusion detection: feature selection versus feature extraction
    Ngo, Vu-Duc
    Vuong, Tuan-Cuong
    Van Luong, Thien
    Tran, Hung
    [J]. CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2024, 27 (03): : 2365 - 2379