Categorical missing data imputation approach via sparse representation

被引:2
|
作者
Shao, Xiaochen [1 ]
Wu, Sen [1 ]
Feng, Xiaodong [2 ]
Song, Rui [3 ]
机构
[1] Univ Sci & Technol Beijing, Donlinks Sch Econ & Management, 30 Xueyuan Rd, Beijing 100083, Peoples R China
[2] Univ Elect Sci & Technol China, Sch Polit Sci & Publ Adm, 2006 Xiyuan Rd, Chengdu 611731, Sichuan, Peoples R China
[3] Datang Telecom Technol & Ind Grp, 40 Xueyuan Rd, Beijing 100083, Peoples R China
基金
中国国家自然科学基金;
关键词
missing values; K-nearest neighbor; categorical attribute; data imputation; sparse representation; dictionary learning; locality constraint; lasso optimization; distance penalty; local smoothness;
D O I
10.1504/IJSTM.2016.078542
中图分类号
C93 [管理学];
学科分类号
12 ; 1201 ; 1202 ; 120202 ;
摘要
K-nearest neighbour (KNN) is an important method for imputation of categorical missing data. The effectiveness of KNN is highly sensitive to some local parameters such as the choice of similarity function and number of neighbours. Aimed at solving these two issues, a categorical missing data imputation algorithm (CSR) is proposed. It firstly conducts matrix transform to make categorical data more complied with calculation. Then it introduces locality constraint thought to sparse representation theory by using KNN as dictionary construction. After that, this method gets weight vectors for each missing instance with smoothness and local structure feature. Lastly, the algorithm selects the maximal corresponding reconstruction value of each missing attribute to fill up the missing data by using the sparse reconstruction coefficient vector. Empirical tests show that CSR outperforms KNNimpute (including its two derivative methods IKNNimpute, SKNNimpute) and LLSimpute from the view of efficiency and stability.
引用
收藏
页码:256 / 270
页数:15
相关论文
共 50 条
  • [1] A nonparametric multiple imputation approach for missing categorical data
    Zhou, Muhan
    He, Yulei
    Yu, Mandi
    Hsu, Chiu-Hsieh
    [J]. BMC MEDICAL RESEARCH METHODOLOGY, 2017, 17
  • [2] A nonparametric multiple imputation approach for missing categorical data
    Muhan Zhou
    Yulei He
    Mandi Yu
    Chiu-Hsieh Hsu
    [J]. BMC Medical Research Methodology, 17
  • [3] Missing Categorical Data Imputation Approach Based on Similarity
    Wu, Sen
    Feng, Xiaodong
    Han, Yushan
    Wang, Qiang
    [J]. PROCEEDINGS 2012 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2012, : 2827 - 2832
  • [4] Latent class based multiple imputation approach for missing categorical data
    Gebregziabher, Mulugeta
    DeSantis, Stacia M.
    [J]. JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2010, 140 (11) : 3252 - 3262
  • [5] Categorical data imputation under MAR missing scheme
    Zimmermann, Pavel
    Mazouch, Petr
    Tesarkova, Klara Hulikova
    [J]. MATHEMATICAL METHODS IN ECONOMICS 2013, PTS I AND II, 2013, : 1052 - 1056
  • [6] Probabilistic Missing Value Imputation for Mixed Categorical and Ordered Data
    Zhao, Yuxuan
    Townsend, Alex
    Udell, Madeleine
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [7] IMPUTATION OF MISSING CATEGORICAL-DATA BY MAXIMIZING INTERNAL CONSISTENCY
    VANBUUREN, S
    VANRIJCKEVORSEL, JLA
    [J]. PSYCHOMETRIKA, 1992, 57 (04) : 567 - 580
  • [8] Machine Learning Based Missing Data Imputation in Categorical Datasets
    Ishaq, Muhammad
    Zahir, Sana
    Iftikhar, Laila
    Bulbul, Mohammad Farhad
    Rho, Seungmin
    Lee, Mi Young
    [J]. IEEE ACCESS, 2024, 12 : 88332 - 88344
  • [9] A Probabilistic Approach for Missing Data Imputation
    Arefin, Muhammed Nazmul
    Masum, Abdul Kadar Muhammad
    [J]. COMPLEXITY, 2024, 2024
  • [10] Handling Missing Data in Presence of Categorical Variables: a New Imputation Procedure
    Ferrari, Pier Alda
    Barbiero, Alessandro
    Manzi, Giancarlo
    [J]. NEW PERSPECTIVES IN STATISTICAL MODELING AND DATA ANALYSIS, 2011, : 473 - 480