Enhancing machine learning-based sentiment analysis through feature extraction techniques

被引:1
|
作者
Semary, Noura A. [1 ]
Ahmed, Wesam [1 ,2 ]
Amin, Khalid [1 ]
Plawiak, Pawel [3 ,4 ]
Hammad, Mohamed [1 ,5 ]
机构
[1] Menoufia Univ, Fac Comp & Informat, Dept Informat Technol, Shibin Al Kawm, Egypt
[2] South Valley Univ, Fac Comp & Artificial Intelligence, Dept Informat Technol, Hurghada, Egypt
[3] Cracow Univ Technol, Fac Comp Sci & Telecommun, Dept Comp Sci, Krakow, Poland
[4] Inst Theoret & Appl Informat, Polish Acad Sci, Gliwice, Poland
[5] Prince Sultan Univ, Coll Comp & Informat Sci, EIAS Data Sci Lab, Riyadh, Saudi Arabia
来源
PLOS ONE | 2024年 / 19卷 / 02期
关键词
LANGUAGE; SMOTE;
D O I
10.1371/journal.pone.0294968
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
A crucial part of sentiment classification is featuring extraction because it involves extracting valuable information from text data, which affects the model's performance. The goal of this paper is to help in selecting a suitable feature extraction method to enhance the performance of sentiment analysis tasks. In order to provide directions for future machine learning and feature extraction research, it is important to analyze and summarize feature extraction techniques methodically from a machine learning standpoint. There are several methods under consideration, including Bag-of-words (BOW), Word2Vector, N-gram, Term Frequency- Inverse Document Frequency (TF-IDF), Hashing Vectorizer (HV), and Global vector for word representation (GloVe). To prove the ability of each feature extractor, we applied it to the Twitter US airlines and Amazon musical instrument reviews datasets. Finally, we trained a random forest classifier using 70% of the training data and 30% of the testing data, enabling us to evaluate and compare the performance using different metrics. Based on our results, we find that the TD-IDF technique demonstrates superior performance, with an accuracy of 99% in the Amazon reviews dataset and 96% in the Twitter US airlines dataset. This study underscores the paramount significance of feature extraction in sentiment analysis, endowing pragmatic insights to elevate model performance and steer future research pursuits.
引用
收藏
页数:19
相关论文
共 50 条
  • [31] Enhancing poultry health management through machine learning-based analysis of vocalization signals dataset
    Adebayo, Segun
    Aworinde, Halleluyah O.
    Akinwunmi, Akinwale O.
    Alabi, Olufemi M.
    Ayandiji, Adebamiji
    Sakpere, Aderonke B.
    Adeyemo, Adetoye
    Oyebamiji, Abel K.
    Olaide, Oke
    Kizito, Echentama
    [J]. DATA IN BRIEF, 2023, 50
  • [32] Enhancing a machine learning model for predicting agricultural drought through feature selection techniques
    Nikdad, Pardis
    Ghaleni, Mehdi Mohammadi
    Moghaddasi, Mahnoosh
    Pradhan, Biswajeet
    [J]. APPLIED WATER SCIENCE, 2024, 14 (06)
  • [33] Detecting Cyberbullying from Tweets Through Machine Learning Techniques with Sentiment Analysis
    Atoum, Jalal Omer
    [J]. ADVANCES IN INFORMATION AND COMMUNICATION, FICC, VOL 2, 2023, 652 : 25 - 38
  • [34] Analysis of sentiment based movie reviews using machine learning techniques
    Chirgaiya, Sachin
    Sukheja, Deepak
    Shrivastava, Niranjan
    Rawat, Romil
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2021, 41 (05) : 5449 - 5456
  • [35] Feature Extraction Based on Semantic Sentiment Analysis
    Almashraee, Mohammed
    [J]. BUSINESS INFORMATION SYSTEMS WORKSHOPS, BIS 2013, 2013, 160 : 270 - 277
  • [36] Comparative Study of Feature Extraction Techniques used in Sentiment Analysis
    Pasarate, Sneha
    Shedge, Rajashree
    [J]. 2016 1ST INTERNATIONAL CONFERENCE ON INNOVATION AND CHALLENGES IN CYBER SECURITY (ICICCS 2016), 2016, : 182 - 186
  • [37] A Machine Learning-Based Technique with Intelligent WordNet Lemmatize for Twitter Sentiment Analysis
    Saranya, S.
    Usha, G.
    [J]. INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2023, 36 (01): : 339 - 352
  • [38] Optimal Feature Selection for Learning-Based Algorithms for Sentiment Classification
    Wang, Zhaoxia
    Lin, Zhiping
    [J]. COGNITIVE COMPUTATION, 2020, 12 (01) : 238 - 248
  • [39] Binocular SLAM Based on Learning-based Feature Extraction
    Liu Chun
    Li Hongfei
    Zhou Qi
    Ma Zhenzhen
    Tang Sisi
    Wan Yaping
    [J]. PROCEEDINGS OF 2020 3RD INTERNATIONAL CONFERENCE ON ROBOT SYSTEMS AND APPLICATIONS, ICRSA2020, 2020, : 25 - 29
  • [40] Optimal Feature Selection for Learning-Based Algorithms for Sentiment Classification
    Zhaoxia Wang
    Zhiping Lin
    [J]. Cognitive Computation, 2020, 12 : 238 - 248