Correlated Cluster-Based Imputation for Treatment of Missing Values

被引:5
|
作者
Myneni, Madhu Bala [1 ]
Srividya, Y. [1 ]
Dandamudi, Akhil [2 ]
机构
[1] Inst Aeronaut Engn, Hyderabad, Andhra Prades, India
[2] NIIT Univ, Neemrana, Rajasthan, India
关键词
Missing values; Imputation methods; Clustering; Correlation;
D O I
10.1007/978-981-10-2471-9_17
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Improved imputation has a major role in the research of data pre-process for data analysis. The missing value treatment is implemented with many of the traditional approaches, such as attribute mean/mode, cluster-based mean/mode substitution. In these approaches, the major concentration is missing valued attribute. This paper presents a framework for correlated cluster-based imputation to improve the quality of data for data mining applications. We make use the correlation analysis on data set with respect to missing data attributes. Based on highly correlated attributes, the data set is divided into clusters using suitable clustering techniques and imputes the missing content with respect to cluster mean value. This correlated cluster-based imputation improves the quality of data. The imputed data are analyzed with K-Nearest Neighbor (KNN) and J48 Decision Tree multi-class classifiers. The efficiency of imputation is ascertaining 100 % accuracy with correlated cluster mean imputed data compared with attribute mean imputed data.
引用
收藏
页码:171 / 178
页数:8
相关论文
共 50 条
  • [1] Cluster-based KNN Missing Value Imputation for DNA Microarray Data
    Keerin, Phimmarin
    Kurutach, Werasak
    Boongoen, Tossapon
    [J]. PROCEEDINGS 2012 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2012, : 445 - 450
  • [2] Cluster-based Best Match Scanning for Large-Scale Missing Data Imputation
    Yu, Weiqing
    Zhu, Wendong
    Liu, Guangyi
    Kan, Bowen
    Zhao, Ting
    Liu, He
    [J]. 2017 3RD INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING AND COMMUNICATIONS (BIGCOM), 2017, : 232 - 238
  • [3] An Improvement of Missing Value Imputation in DNA Microarray Data Using Cluster-based LLS Method
    Keerin, Phimmarin
    Kurutach, Werasak
    Boongoen, Tossapon
    [J]. 2013 13TH INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS AND INFORMATION TECHNOLOGIES (ISCIT): COMMUNICATION AND INFORMATION TECHNOLOGY FOR NEW LIFE STYLE BEYOND THE CLOUD, 2013, : 559 - 564
  • [4] Treatment of missing values with imputation for the analysis of otologic data
    Laurikkala, J
    Kentala, E
    Juhola, M
    Pyykkö, I
    [J]. MEDICAL INFORMATICS EUROPE '99, 1999, 68 : 428 - 431
  • [5] Multiple imputation of missing values
    Royston, Patrick
    [J]. STATA JOURNAL, 2004, 4 (03): : 227 - 241
  • [6] Sequential imputation for missing values
    Verboven, Sabine
    Branden, Karlien Vanden
    Goos, Peter
    [J]. COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2007, 31 (5-6) : 320 - 327
  • [7] Traffic Time Prediction Based on Imputation Algorithms for Missing Values
    Guo, Cong
    Gu, Xinyu
    Li, Qiangian
    Qu, Jiabin
    Zhang, Lin
    [J]. PROCEEDINGS OF 2018 INTERNATIONAL CONFERENCE ON NETWORK INFRASTRUCTURE AND DIGITAL CONTENT (IEEE IC-NIDC), 2018, : 223 - 228
  • [8] Iterative KNN Imputation Based on GRA for Missing Values in TPLMS
    Zhu, Ming
    Cheng, Xingbing
    [J]. PROCEEDINGS OF 2015 4TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT 2015), 2015, : 94 - 99
  • [9] Multiple imputation of missing values: update
    Royston, P
    [J]. STATA JOURNAL, 2005, 5 (02): : 188 - 201
  • [10] GBKII: An imputation method for missing values
    Zhang, Chengqi
    Zhu, Xiaofeng
    Zhang, Jilian
    Qin, Yongsong
    Zhang, Shichao
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2007, 4426 : 1080 - +