Incorporating receiver operating characteristics into naive Bayes for unbalanced data classification

被引:14
|
作者
Kim, Taeheung [1 ]
Chung, Byung Do [2 ]
Lee, Jong-Seok [1 ]
机构
[1] Sungkyunkwan Univ, Dept Ind Engn, Suwon 16419, South Korea
[2] Yonsei Univ, Dept Informat & Ind Engn, 50 Yonsei Ro, Seoul 03722, South Korea
关键词
Unbalanced classification; Weighted naive Bayes; Receiver operating characteristics; Area under ROC curve;
D O I
10.1007/s00607-016-0483-z
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Naive Bayesian classification has been widely used in data mining area because of its simplicity and robustness to missing values and irrelevant attributes. However, naive Bayes classifiers sometimes show poor performance due to their unrealistic assumption that all attributes are equally important and conditionally independent of each other. In this research, we dispense with the former assumption by proposing a new attribute weighting method. The proposed method considers each attribute as a single classifier and measures its discriminating ability using the area under an ROC curve (AUC). Each AUC value is then used to weight the corresponding attribute. In addition, we try to reduce the complexity of classification models by selecting high AUC attributes. Using 20 real datasets from the machine learning repository at UC Irvine (UCI), we conduct a numerical experiment to show that the proposed method is an improvement over standard naive Bayes classification and existing weighting methods.
引用
收藏
页码:203 / 218
页数:16
相关论文
共 50 条
  • [1] Incorporating receiver operating characteristics into naive Bayes for unbalanced data classification
    Taeheung Kim
    Byung Do Chung
    Jong-Seok Lee
    Computing, 2017, 99 : 203 - 218
  • [2] Constrained Naive Bayes with application to unbalanced data classification
    Blanquero, Rafael
    Carrizosa, Emilio
    Ramirez-Cobo, Pepa
    Sillero-Denamiel, M. Remedios
    CENTRAL EUROPEAN JOURNAL OF OPERATIONS RESEARCH, 2022, 30 (04) : 1403 - 1425
  • [3] Naive Bayes for text classification with unbalanced classes
    Frank, Eibe
    Bouckaert, Remco R.
    KNOWLEDGE DISCOVERY IN DATABASES: PKDD 2006, PROCEEDINGS, 2006, 4213 : 503 - 510
  • [4] Naive Bayes Classification of Uncertain Data
    Ren, Jiangtao
    Lee, Sau Dan
    Chen, Xianlu
    Kao, Ben
    Cheng, Reynold
    Cheung, David
    2009 9TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, 2009, : 944 - +
  • [6] Data Classification Using Rough Sets and Naive Bayes
    Al-Aidaroos, Khadija
    Abu Bakar, Azuraliza
    Othman, Zalinda
    ROUGH SET AND KNOWLEDGE TECHNOLOGY (RSKT), 2010, 6401 : 134 - 142
  • [7] Modeling naive bayes imputation classification for missing data
    Khotimah, B. K.
    Miswanto
    Suprajitno, H.
    FIRST INTERNATIONAL CONFERENCE ON ENVIRONMENTAL GEOGRAPHY AND GEOGRAPHY EDUCATION (ICEGE), 2019, 243
  • [8] Constrained Naïve Bayes with application to unbalanced data classification
    Rafael Blanquero
    Emilio Carrizosa
    Pepa Ramírez-Cobo
    M. Remedios Sillero-Denamiel
    Central European Journal of Operations Research, 2022, 30 : 1403 - 1425
  • [9] CHARACTERISTICS OF BAYES RECEIVER
    RADCHENKO, TA
    TRIFONOV, AP
    RADIOTEKHNIKA I ELEKTRONIKA, 1978, 23 (04): : 850 - 853
  • [10] Naive Bayes Classification Algorithm Based on Optimized Training Data
    Zhu, Xiaodan
    Su, Jinsong
    Wu, Qingfeng
    Dong, Huailin
    MECHATRONICS AND INTELLIGENT MATERIALS II, PTS 1-6, 2012, 490-495 : 460 - 464