Modeling naive bayes imputation classification for missing data

被引:1
|
作者
Khotimah, B. K. [1 ,3 ]
Miswanto [1 ,2 ]
Suprajitno, H. [1 ,2 ]
机构
[1] Univ Airlangga, Fac Sci & Technol, Surabaya, Indonesia
[2] Univ Airlangga, Dept Math, Surabaya, Indonesia
[3] Univ Trunojoyo Madura, Dept Informat Engn, Bangkalan, Indonesia
关键词
VALUES;
D O I
10.1088/1755-1315/243/1/012111
中图分类号
G40 [教育学];
学科分类号
040101 ; 120403 ;
摘要
Naive Bayes Imputation (NBI) is used to fill in missing values by replacing the attribute information according to the probability estimate. The NBI process divides the whole data into two sub-sets is the complete data and data containing missing data. Complete data is used for the imputation process at the lost value. The process is repeated for each missing attribute to generate complete data for classification. This research applies NBI for imputation and preprocessing as preparation of classification process. The trial of this study used NBI for imputation compared to using the mean and mode to predict the missing data. The data used for imputation is full train of complete data as a whole to predict the missing value so as to represent the entire data. The results of this study prove that imputation with NBI produces the right imputation with higher accuracy than other imputations. NBI with single imputation and multiple imputation results in better performance because of the right features. This study aims to calculate the effect of missing values on Naive Bayes Imputation Algorithm is based on a probalistic model using mixed data. Empirically shows that the interaction between several methods of imputation and supervised classification results in differences in the performance of classification for the same imputation method.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] Impact of missing data imputation methods on gene expression clustering and classification
    Marcilio CP de Souto
    Pablo A Jaskowiak
    Ivan G Costa
    BMC Bioinformatics, 16
  • [32] Autoencoder imputation of missing heterogeneous data for Alzheimer's disease classification
    Haridas, Namitha Thalekkara
    Sanchez-Bornot, Jose M.
    McClean, Paula L.
    Wong-Lin, KongFatt
    HEALTHCARE TECHNOLOGY LETTERS, 2024, : 452 - 460
  • [33] Missing data imputation for fuzzy rule-based classification systems
    Julián Luengo
    José A. Sáez
    Francisco Herrera
    Soft Computing, 2012, 16 : 863 - 881
  • [34] Missing data imputation for fuzzy rule-based classification systems
    Luengo, Julian
    Saez, Jose A.
    Herrera, Francisco
    SOFT COMPUTING, 2012, 16 (05) : 863 - 881
  • [35] Evaluating Imputation Techniques for Missing Data in ADNI: A Patient Classification Study
    Campos, Sergio
    Pizarro, Luis
    Valle, Carlos
    Gray, Katherine R.
    Rueckert, Daniel
    Allende, Hector
    PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, CIARP 2015, 2015, 9423 : 3 - 10
  • [36] Sentiment classification on Big Data using Naive Bayes and Logistic Regression
    Prabhat, Anjuman
    Khullar, Vikas
    2017 INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND INFORMATICS (ICCCI), 2017,
  • [37] Incorporating receiver operating characteristics into naive Bayes for unbalanced data classification
    Kim, Taeheung
    Chung, Byung Do
    Lee, Jong-Seok
    COMPUTING, 2017, 99 (03) : 203 - 218
  • [38] Dynamic cost-sensitive naive bayes classification for uncertain data
    Huang, Yuwen
    International Journal of Database Theory and Application, 2015, 8 (01): : 271 - 280
  • [39] Maximizing AUC to learn weighted naive Bayes for imbalanced data classification
    Kim, Taeheung
    Lee, Jong-Seok
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 217
  • [40] Educational data Classification using Selective Naive Bayes for Quota categorization
    Dangi, Abhilasha
    Srivastava, Sumit
    2014 IEEE INTERNATIONAL CONFERENCE ON MOOC, INNOVATION AND TECHNOLOGY IN EDUCATION (MITE), 2014, : 118 - 121