Shell-neighbor method and its application in missing data imputation

被引:90
|
作者
Zhang, Shichao [1 ,2 ]
机构
[1] Zhejiang Normal Univ, Dept Comp Sci, Jinhua, Peoples R China
[2] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing 210008, Peoples R China
基金
澳大利亚研究理事会;
关键词
kNN; Shell-NN; Missing data imputation; Mining incomplete data; INCOMPLETE DATA; VALUES;
D O I
10.1007/s10489-009-0207-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data preparation is an important step in mining incomplete data. To deal with this problem, this paper introduces a new imputation approach called SN (Shell Neighbors) imputation, or simply SNI. The SNI fills in an incomplete instance (with missing values) in a given dataset by only using its left and right nearest neighbors with respect to each factor (attribute), referred them to Shell Neighbors. The left and right nearest neighbors are selected from a set of nearest neighbors of the incomplete instance. The size of the sets of the nearest neighbors is determined with the cross-validation method. And then the SNI is generalized to deal with missing data in datasets with mixed attributes, for example, continuous and categorical attributes. Some experiments are conducted for evaluating the proposed approach, and demonstrate that the generalized SNI method outperforms the kNN imputation method at imputation accuracy and classification accuracy.
引用
收藏
页码:123 / 133
页数:11
相关论文
共 50 条
  • [21] Multiple imputation for missing edge data: A predictive evaluation method with application to Add Health
    Wang, Cheng
    Butts, Carter T.
    Hipp, John R.
    Jose, Rupa
    Lakon, Cynthia M.
    SOCIAL NETWORKS, 2016, 45 : 89 - 98
  • [22] APPLICATION OF IMPROVED MULTIPLE IMPUTATION METHOD IN THE ESTIMATION OF THE OUTSTANDING CLAIMS RESERVE WITH MISSING DATA
    Yan, Chun
    Yang, Xiaowei
    Liu, Wei
    Liu, Jiahui
    JOURNAL OF NONLINEAR AND CONVEX ANALYSIS, 2019, 20 (07) : 1405 - 1413
  • [23] Missing Data Imputation and Its Effect on the Accuracy of Classification
    Hunt, Lynette A.
    DATA SCIENCE: INNOVATIVE DEVELOPMENTS IN DATA ANALYSIS AND CLUSTERING, 2017, : 3 - 14
  • [24] Method of missing data imputation for multivariate time series
    Li Z.
    Zhang F.
    Wang Y.
    Tao Q.
    Li C.
    2018, Chinese Institute of Electronics (40): : 225 - 230
  • [25] k-nearest neighbor imputation method and its application in fault diagnosis of industrial process
    Li, Yuan
    Wu, Jie
    Wang, Guo-Zhu
    Shanghai Jiaotong Daxue Xuebao/Journal of Shanghai Jiaotong University, 2015, 49 (06): : 830 - 836
  • [26] Robust imputation method for missing values in microarray data
    Yoon, Dankyu
    Lee, Eun-Kyung
    Park, Taesung
    BMC BIOINFORMATICS, 2007, 8 (Suppl 2)
  • [27] A Modified Imputation Method to Missing Data as a Preprocessing Technique
    Caparino, Elenita T.
    Sison, Ariel M.
    Medina, Ruji P.
    2018 IEEE 10TH INTERNATIONAL CONFERENCE ON HUMANOID, NANOTECHNOLOGY, INFORMATION TECHNOLOGY, COMMUNICATION AND CONTROL, ENVIRONMENT AND MANAGEMENT (HNICEM), 2018,
  • [28] Robust imputation method for missing values in microarray data
    Dankyu Yoon
    Eun-Kyung Lee
    Taesung Park
    BMC Bioinformatics, 8
  • [29] A robust missing value imputation method for noisy data
    Bing Zhu
    Changzheng He
    Panos Liatsis
    Applied Intelligence, 2012, 36 : 61 - 74
  • [30] A New Method to Missing Value Imputation for Immunosignature Data
    Koshechkin, A. A.
    Andryushchenko, V. S.
    Zamyatin, A., V
    SOVREMENNYE TEHNOLOGII V MEDICINE, 2019, 11 (02) : 19 - 23