Weakening Feature Independence of Naive Bayes Using Feature Weighting and Selection on Imbalanced Customer Review Data

被引:0
|
作者
Cahya, Reiza Adi [1 ]
Bachtiar, Fitra A. [1 ]
机构
[1] Brawijaya Univ, Fac Comp Sci, Malang, Indonesia
关键词
sentiment analysis; genetic algorithm; imbalanced data; naive Bayes; feature selection; feature weighting; CONSTRUCTION; ALGORITHM;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
E-commerce sites have provided review section for users to take advantage of and express their opinion about products or services. Decision makers, on other hand, can also take advantage of the abundant reviews to analyze which aspects of products or services to be improved, which is known as sentiment analysis. naive Bayes (NB) is popular method for sentiment analysis because it is considerably faster than other methods but has comparable performance. One weakness of NB however, is that NB assumes each feature is independent with other features. This assumption is not fulfilled in sentiment analysis because terms are correlated with others. Two approaches, i.e. feature weighting (FW) and feature selection (FS) are used to weaken this assumption. Two approaches use genetic algorithm (GA) to find optimal weights and subset based on correlation and odds ratio to take imbalanced review data into account. Experiments on Women Ecommerce Clothing Review dataset show that FW approach has comparable results to non-weighted NB and FS yield worse results than NB. It can be concluded that proposed FW and FS scheme cannot improve standard NB.
引用
收藏
页码:182 / 187
页数:6
相关论文
共 50 条
  • [1] Naive Feature Selection: Sparsity in Naive Bayes
    Askari, Armin
    d'Aspremont, Alex
    El Ghaoui, Laurent
    [J]. INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108 : 1813 - 1821
  • [2] Feature Selection in Imbalanced Data
    Kamalov F.
    Thabtah F.
    Leung H.H.
    [J]. Annals of Data Science, 2023, 10 (6) : 1527 - 1541
  • [3] Feature subset selection using naive Bayes for text classification
    Feng, Guozhong
    Guo, Jianhua
    Jing, Bing-Yi
    Sun, Tieli
    [J]. PATTERN RECOGNITION LETTERS, 2015, 65 : 109 - 115
  • [4] Dynamic Feature Weighting for Imbalanced Data Sets
    Dialameh, Maryam
    Jahromi, Mansoor Zolghadri
    [J]. 2015 SIGNAL PROCESSING AND INTELLIGENT SYSTEMS CONFERENCE (SPIS), 2015, : 31 - 36
  • [5] Feature selection for text classification with Naive Bayes
    Chen, Jingnian
    Huang, Houkuan
    Tian, Shengfeng
    Qu, Youli
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (03) : 5432 - 5435
  • [6] Feature selection for optimizing the Naive Bayes algorithm
    Winarti, Titin
    Vydia, Vensy
    [J]. ENGINEERING, INFORMATION AND AGRICULTURAL TECHNOLOGY IN THE GLOBAL DIGITAL REVOLUTION, 2020, : 47 - 51
  • [7] RETRACTED ARTICLE: Customer behavior analysis using Naive Bayes with bagging homogeneous feature selection approach
    R. Siva Subramanian
    D. Prabha
    [J]. Journal of Ambient Intelligence and Humanized Computing, 2021, 12 : 5105 - 5116
  • [8] Retraction Note to: Customer behavior analysis using Naive Bayes with bagging homogeneous feature selection approach
    R. Siva Subramanian
    D. Prabha
    [J]. Journal of Ambient Intelligence and Humanized Computing, 2023, 14 (Suppl 1) : 111 - 111
  • [9] Two feature weighting approaches for naive Bayes text classifiers
    Zhang, Lungan
    Jiang, Liangxiao
    Li, Chaoqun
    Kong, Ganggang
    [J]. KNOWLEDGE-BASED SYSTEMS, 2016, 100 : 137 - 144
  • [10] DEEP FEATURE WEIGHTING IN NAIVE BAYES FOR CHINESE TEXT CLASSIFICATION
    Jiang, Qiaowei
    Wang, Wen
    Han, Xu
    Zhang, Shasha
    Wang, Xinyan
    Wang, Cong
    [J]. PROCEEDINGS OF 2016 4TH IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (IEEE CCIS 2016), 2016, : 160 - 164