Feature selection for unbalanced class distribution and Naive Bayes

被引:0
|
作者
Mladenic, D [1 ]
Grobelnik, M [1 ]
机构
[1] Jozef Stefan Inst, Dept Intelligent Syst, Ljubljana 1000, Slovenia
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper describes an approach to feature subset selection that takes into account problem specifics and learning algorithm characteristics. It is developed for the Naive Bayesian classifier applied on text data, since it combines well with the addressed learning problems. We focus on domains with many features that also have a highly unbalanced class distribution and asymmetric misclassification costs given only implicitly in the problem. By asymmetric misclassification costs we mean that one of the class values is the target class value for which we want to get predictions and we prefer false positive over false negative. Our example problem is automatic document categorization using machine learning, where we want to identify documents relevant for the selected category. Usually, only about 1%-10% of examples belong to the selected category. Our experimental comparison of eleven feature scoring measures show that considering domain and algorithm characteristics significantly improves the results of classification.
引用
收藏
页码:258 / 267
页数:10
相关论文
共 50 条
  • [21] A Naive Bayes approach for URL classification with supervised feature selection and rejection framework
    Rajalakshmi, R.
    Aravindan, Chandrabose
    [J]. COMPUTATIONAL INTELLIGENCE, 2018, 34 (01) : 363 - 396
  • [22] Speeding up incremental wrapper feature subset selection with Naive Bayes classifier
    Bermejo, Pablo
    Gamez, Jose A.
    Puerta, Jose M.
    [J]. KNOWLEDGE-BASED SYSTEMS, 2014, 55 : 140 - 147
  • [23] Constrained Naive Bayes with application to unbalanced data classification
    Blanquero, Rafael
    Carrizosa, Emilio
    Ramirez-Cobo, Pepa
    Sillero-Denamiel, M. Remedios
    [J]. CENTRAL EUROPEAN JOURNAL OF OPERATIONS RESEARCH, 2022, 30 (04) : 1403 - 1425
  • [24] Weakening Feature Independence of Naive Bayes Using Feature Weighting and Selection on Imbalanced Customer Review Data
    Cahya, Reiza Adi
    Bachtiar, Fitra A.
    [J]. 2019 5TH INTERNATIONAL CONFERENCE ON SCIENCE ININFORMATION TECHNOLOGY (ICSITECH): EMBRACING INDUSTRY 4.0 - TOWARDS INNOVATION IN CYBER PHYSICAL SYSTEM, 2019, : 182 - 187
  • [25] Class dependent feature scaling method using naive Bayes classifier for text datamining
    Youn, Eunseog
    Jeong, Myong K.
    [J]. PATTERN RECOGNITION LETTERS, 2009, 30 (05) : 477 - 485
  • [26] Feature Selection for Chemical Compound Extraction using Wrapper Approach with Naive Bayes Classifier
    Alshaikhdeeb, Basel
    Ahmad, Kamsuriah
    [J]. PROCEEDINGS OF THE 2017 6TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING AND INFORMATICS (ICEEI'17), 2017,
  • [27] Variable selection for Naive Bayes classification
    Blanquero, Rafael
    Carrizosa, Emilio
    Ramirez-Cobo, Pepa
    Remedios Sillero-Denamiel, M.
    [J]. COMPUTERS & OPERATIONS RESEARCH, 2021, 135
  • [28] A Method for Avoiding Bias from Feature Selection with Application to Naive Bayes Classification Models
    Li, Longhai
    Zhang, Jianguo
    Neal, Radford M.
    [J]. BAYESIAN ANALYSIS, 2008, 3 (01): : 171 - 196
  • [29] An Improved Feature Selection Based on Naive Bayes with Kernel Density Estimator for Opinion Mining
    Raja Rajeswari Sethuraman
    John Sanjeev Kumar Athisayam
    [J]. Arabian Journal for Science and Engineering, 2021, 46 : 4059 - 4071
  • [30] Robust Method of Sparse Feature Selection for Multi-Label Classification with Naive Bayes
    Ruta, Dymitr
    [J]. FEDERATED CONFERENCE ON COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2014, 2014, 2 : 375 - 380