Incorporating receiver operating characteristics into naive Bayes for unbalanced data classification

被引:14
|
作者
Kim, Taeheung [1 ]
Chung, Byung Do [2 ]
Lee, Jong-Seok [1 ]
机构
[1] Sungkyunkwan Univ, Dept Ind Engn, Suwon 16419, South Korea
[2] Yonsei Univ, Dept Informat & Ind Engn, 50 Yonsei Ro, Seoul 03722, South Korea
关键词
Unbalanced classification; Weighted naive Bayes; Receiver operating characteristics; Area under ROC curve;
D O I
10.1007/s00607-016-0483-z
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Naive Bayesian classification has been widely used in data mining area because of its simplicity and robustness to missing values and irrelevant attributes. However, naive Bayes classifiers sometimes show poor performance due to their unrealistic assumption that all attributes are equally important and conditionally independent of each other. In this research, we dispense with the former assumption by proposing a new attribute weighting method. The proposed method considers each attribute as a single classifier and measures its discriminating ability using the area under an ROC curve (AUC). Each AUC value is then used to weight the corresponding attribute. In addition, we try to reduce the complexity of classification models by selecting high AUC attributes. Using 20 real datasets from the machine learning repository at UC Irvine (UCI), we conduct a numerical experiment to show that the proposed method is an improvement over standard naive Bayes classification and existing weighting methods.
引用
收藏
页码:203 / 218
页数:16
相关论文
共 50 条
  • [31] Privacy preserving naive Bayes classification
    Zhang, P
    Tong, YH
    Tang, SW
    Yang, DQ
    ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS, 2005, 3584 : 744 - 752
  • [32] Lyrics Classification using Naive Bayes
    Buzic, Dalibor
    Dobsa, Jasminka
    2018 41ST INTERNATIONAL CONVENTION ON INFORMATION AND COMMUNICATION TECHNOLOGY, ELECTRONICS AND MICROELECTRONICS (MIPRO), 2018, : 1011 - 1015
  • [33] Private naive bayes classification of personal biomedical data: Application in cancer data analysis
    Wood, Alexander
    Shpilrain, Vladimir
    Najarian, Kayvan
    Kahrobaei, Delaram
    COMPUTERS IN BIOLOGY AND MEDICINE, 2019, 105 : 144 - 150
  • [34] Scalable Sentiment Classification for Big Data Analysis Using Naive Bayes Classifier
    Liu, Bingwei
    Blasch, Erik
    Chen, Yu
    Shen, Dan
    Chen, Genshe
    2013 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2013,
  • [35] Integrating Data Mining Techniques for Naive Bayes Classification: Applications to Medical Datasets
    Changpetch, Pannapa
    Pitpeng, Apasiri
    Hiriote, Sasiprapa
    Yuangyai, Chumpol
    COMPUTATION, 2021, 9 (09)
  • [36] A Text Classification Approach using Parallel Naive Bayes in Big Data Context
    Amazal, Houda
    Ramdani, Mohammed
    Kissi, Mohamed
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS: THEORIES AND APPLICATIONS (SITA'18), 2018,
  • [37] Naive Bayes texture classification applied to whisker data from a moving robot
    Lepora, Nathan F.
    Evans, Mat
    Fox, Charles W.
    Diamond, Mathew E.
    Gurney, Kevin
    Prescott, Tony J.
    2010 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS IJCNN 2010, 2010,
  • [38] Random vector functional link with naive Bayes for classification problems of mixed data
    Ruz, Gonzalo A.
    Henriquez, Pablo A.
    2019 IEEE 31ST INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2019), 2019, : 1749 - 1752
  • [39] Naive Bayes classification model for isotopologue detection in LC-HRMS data
    van Herwerden, Denice
    O'Brien, Jake W.
    Choi, Phil M.
    Thomas, Kevin, V
    Schoenmakers, Peter J.
    Samanipour, Saer
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2022, 223
  • [40] Naive Bayes Classification Ensembles to Support Modeling Decisions in Data Stream Mining
    Lutu, Patricia E. N.
    2015 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI), 2015, : 335 - 340