Shell-neighbor method and its application in missing data imputation

被引:90
|
作者
Zhang, Shichao [1 ,2 ]
机构
[1] Zhejiang Normal Univ, Dept Comp Sci, Jinhua, Peoples R China
[2] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing 210008, Peoples R China
基金
澳大利亚研究理事会;
关键词
kNN; Shell-NN; Missing data imputation; Mining incomplete data; INCOMPLETE DATA; VALUES;
D O I
10.1007/s10489-009-0207-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data preparation is an important step in mining incomplete data. To deal with this problem, this paper introduces a new imputation approach called SN (Shell Neighbors) imputation, or simply SNI. The SNI fills in an incomplete instance (with missing values) in a given dataset by only using its left and right nearest neighbors with respect to each factor (attribute), referred them to Shell Neighbors. The left and right nearest neighbors are selected from a set of nearest neighbors of the incomplete instance. The size of the sets of the nearest neighbors is determined with the cross-validation method. And then the SNI is generalized to deal with missing data in datasets with mixed attributes, for example, continuous and categorical attributes. Some experiments are conducted for evaluating the proposed approach, and demonstrate that the generalized SNI method outperforms the kNN imputation method at imputation accuracy and classification accuracy.
引用
收藏
页码:123 / 133
页数:11
相关论文
共 50 条
  • [1] Shell-neighbor method and its application in missing data imputation
    Shichao Zhang
    Applied Intelligence, 2011, 35 : 123 - 133
  • [2] Application of Multiple Imputation Method for Missing Data Estimation
    Ser, Gazel
    GAZI UNIVERSITY JOURNAL OF SCIENCE, 2012, 25 (04): : 869 - 873
  • [3] Missing data imputation based on stochastic neighbor embedding
    Petrov, I. B.
    Ryazanov, V. V.
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE (ICPRAI 2018), 2018, : 698 - 701
  • [4] Improved methods for the imputation of missing data by nearest neighbor methods
    Tutz, Gerhard
    Ramzan, Shahla
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2015, 90 : 84 - 99
  • [5] A Hybrid Algorithm for Missing Data Imputation and Its Application to Electrical Data Loggers
    Crespo Turrado, Concepcion
    Sanchez Lasheras, Fernando
    Luis Calvo-Rolle, Jose
    Pinon-Pazos, Andres-Jose
    Melero, Manuel G.
    Javier de Cos Juez, Francisco
    SENSORS, 2016, 16 (09):
  • [6] Imputation of mean of ratios for missing data and its application to PPSWR sampling
    Zou, Guo Hua
    Li, Ying Fu
    Zhu, Rong
    Guan, Zhong
    ACTA MATHEMATICA SINICA-ENGLISH SERIES, 2010, 26 (05) : 863 - 874
  • [7] Imputation of Mean of Ratios for Missing Data and Its Application to PPSWR Sampling
    Guo Hua ZOU Academy of Mathematics and Systems Science
    Acta Mathematica Sinica,English Series, 2010, 26 (05) : 863 - 874
  • [8] Imputation of mean of ratios for missing data and its application to PPSWR sampling
    Guo Hua Zou
    Ying Fu Li
    Rong Zhu
    Zhong Guan
    Acta Mathematica Sinica, English Series, 2010, 26 : 863 - 874
  • [9] Application of the Modified Imputation Method to Missing Data to Increase Classification Performance
    Caparino, Elenita T.
    Sison, Ariel M.
    Medina, Ruji P.
    2019 IEEE 4TH INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION SYSTEMS (ICCCS 2019), 2019, : 134 - 139
  • [10] Application of Multiple Imputation Method in Analyzing Data with Missing Continuous Covariates
    Tamar, S. Ghasemizadeh
    Ganjali, M.
    KOREAN JOURNAL OF APPLIED STATISTICS, 2008, 21 (04) : 659 - 664