Feature selection for unbalanced class distribution and Naive Bayes

被引:0
|
作者
Mladenic, D [1 ]
Grobelnik, M [1 ]
机构
[1] Jozef Stefan Inst, Dept Intelligent Syst, Ljubljana 1000, Slovenia
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper describes an approach to feature subset selection that takes into account problem specifics and learning algorithm characteristics. It is developed for the Naive Bayesian classifier applied on text data, since it combines well with the addressed learning problems. We focus on domains with many features that also have a highly unbalanced class distribution and asymmetric misclassification costs given only implicitly in the problem. By asymmetric misclassification costs we mean that one of the class values is the target class value for which we want to get predictions and we prefer false positive over false negative. Our example problem is automatic document categorization using machine learning, where we want to identify documents relevant for the selected category. Usually, only about 1%-10% of examples belong to the selected category. Our experimental comparison of eleven feature scoring measures show that considering domain and algorithm characteristics significantly improves the results of classification.
引用
收藏
页码:258 / 267
页数:10
相关论文
共 50 条
  • [1] Naive Feature Selection: Sparsity in Naive Bayes
    Askari, Armin
    d'Aspremont, Alex
    El Ghaoui, Laurent
    [J]. INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108 : 1813 - 1821
  • [2] Feature selection for optimizing the Naive Bayes algorithm
    Winarti, Titin
    Vydia, Vensy
    [J]. ENGINEERING, INFORMATION AND AGRICULTURAL TECHNOLOGY IN THE GLOBAL DIGITAL REVOLUTION, 2020, : 47 - 51
  • [3] Feature selection for text classification with Naive Bayes
    Chen, Jingnian
    Huang, Houkuan
    Tian, Shengfeng
    Qu, Youli
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (03) : 5432 - 5435
  • [4] Learning naive Bayes for probability estimation by feature selection
    Jiang, Liangxiao
    Zhang, Harry
    [J]. ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2006, 4013 : 503 - 514
  • [5] Naive Feature Selection: A Nearly Tight Convex Relaxation for Sparse Naive Bayes
    Askari, Armin
    d'Aspremont, Alexandre
    El Ghaoui, Laurent
    [J]. MATHEMATICS OF OPERATIONS RESEARCH, 2024, 49 (01)
  • [6] Text Classification Based on Naive Bayes Algorithm with Feature Selection
    Chen, Zhenguo
    Shi, Guang
    Wang, Xiaoju
    [J]. INFORMATION-AN INTERNATIONAL INTERDISCIPLINARY JOURNAL, 2012, 15 (10): : 4255 - 4260
  • [7] A New Feature Selection Approach to Naive Bayes Text Classifiers
    Zhang, Lungan
    Jiang, Liangxiao
    Li, Chaoqun
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2016, 30 (02)
  • [8] Feature selection for multi-label naive Bayes classification
    Zhang, Min-Ling
    Pena, Jose M.
    Robles, Victor
    [J]. INFORMATION SCIENCES, 2009, 179 (19) : 3218 - 3229
  • [9] Feature subset selection using naive Bayes for text classification
    Feng, Guozhong
    Guo, Jianhua
    Jing, Bing-Yi
    Sun, Tieli
    [J]. PATTERN RECOGNITION LETTERS, 2015, 65 : 109 - 115
  • [10] Naive Bayes-Guided Bat Algorithm for Feature Selection
    Taha, Ahmed Majid
    Mustapha, Aida
    Chen, Soong-Der
    [J]. SCIENTIFIC WORLD JOURNAL, 2013,