The impact of preprocessing steps on the accuracy of machine learning algorithms in sentiment analysis

被引:51
|
作者
Alam, Saqib [1 ]
Yao, Nianmin [1 ]
机构
[1] Dalian Univ Technol, Dept Elect Informat & Elect Engn, Black Bldg,Linggong Rd 2, Dalian 116024, Peoples R China
关键词
Preprocessing; Machine learning; Sentiment analysis; Word2Vec; FEATURE-SELECTION; CLASSIFICATION; FRAMEWORK; REVIEWS;
D O I
10.1007/s10588-018-9266-8
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Big data and its related technologies have become active areas of research recently. There is a huge amount of data generated every minute and second that includes unstructured data which is the topic of interest for researchers now a days. A lot of research work is currently going on in the areas of text analytics and text preprocessing. In this paper, we have studied the impact of different preprocessing steps on the accuracy of three machine learning algorithms for sentiment analysis. We applied different text preprocessing techniques and studied their impact on accuracy for sentiment classification using three well-known machine learning classifiers including Naive Bayes (NB), maximum entropy (MaxE), and support vector machines (SVM). We calculated accuracy of the three machine learning algorithms before and after applying the preprocessing steps. Results proved that the accuracy of NB algorithm was significantly improved after applying the preprocessing steps. Slight improvement in accuracy of SVM algorithm was seen after applying the preprocessing steps. Interestingly, in case of MaxE algorithm, no improvement in accuracy was seen. Our work is a comparative study, and our results proved that in case of NB algorithm, actuary was again significantly high than any other machine learning algorithm after applying the preprocessing steps; followed by MaxE and SVM algorithms. This research work proves that text preprocessing impacts the accuracy of machine learning algorithms. It further concludes that in case of NB algorithm, accuracy has significantly improved after applying text preprocessing steps.
引用
收藏
页码:319 / 335
页数:17
相关论文
共 50 条
  • [1] The impact of preprocessing steps on the accuracy of machine learning algorithms in sentiment analysis
    Saqib Alam
    Nianmin Yao
    [J]. Computational and Mathematical Organization Theory, 2019, 25 : 319 - 335
  • [2] Optimizing Machine Learning-based Sentiment Analysis Accuracy in Bilingual Sentences via Preprocessing Techniques
    Maree, Mohammed
    Eleyat, Mujahed
    Mesqali, Enas
    [J]. INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2024, 21 (02) : 257 - 270
  • [3] Sentiment Analysis Using Machine Learning Algorithms
    Jemai, Fatma
    Hayouni, Mohamed
    Baccar, Sahbi
    [J]. IWCMC 2021: 2021 17TH INTERNATIONAL WIRELESS COMMUNICATIONS & MOBILE COMPUTING CONFERENCE (IWCMC), 2021, : 775 - 779
  • [4] Various Machine Learning Algorithms for Twitter Sentiment Analysis
    Singh, Rishija
    Goel, Vikas
    [J]. INFORMATION AND COMMUNICATION TECHNOLOGY FOR COMPETITIVE STRATEGIES, 2019, 40 : 763 - 772
  • [5] Sentiment Analysis in Turkish Text with Machine Learning Algorithms
    Rumelli, Merve
    Akkus, Deniz
    Kart, Ozge
    Isik, Zerrin
    [J]. 2019 INNOVATIONS IN INTELLIGENT SYSTEMS AND APPLICATIONS CONFERENCE (ASYU), 2019, : 123 - 127
  • [6] Preprocessing Impact on Turkish Sentiment Analysis
    Mulki, Hala
    Haddad, Hatem
    Ali, Chedi Bechikh
    Babaoglu, Ismail
    [J]. 2018 26TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2018,
  • [7] Sentiment Analysis on Different Domains Using Machine Learning Algorithms
    Ahuja, Ravinder
    Sharma, S. C.
    [J]. ADVANCES IN DATA AND INFORMATION SCIENCES, 2022, 318 : 143 - 153
  • [8] Comparative Study of Machine Learning Algorithms for Movie Sentiment Analysis
    Arfaoui, Nouha
    [J]. JOURNAL OF INFORMATION ASSURANCE AND SECURITY, 2023, 18 (01): : 25 - 38
  • [9] Comparative Study of Machine Learning Algorithms for Twitter Sentiment Analysis
    Indulkar, Yash
    Patil, Abhijit
    [J]. 2021 INTERNATIONAL CONFERENCE ON EMERGING SMART COMPUTING AND INFORMATICS (ESCI), 2021, : 295 - 299
  • [10] Sentiment Analysis of Twitter Posts using Machine Learning Algorithms
    Gupta, Ashutosh
    Singh, Anusha
    Pandita, Ishan
    Parashar, Harsh
    [J]. PROCEEDINGS OF THE 2019 6TH INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT (INDIACOM), 2019, : 980 - 983