Missing data imputation by K nearest neighbours based on grey relational structure and mutual information

被引:91
|
作者
Pan, Ruilin [1 ]
Yang, Tingsheng [1 ]
Cao, Jianhua [1 ]
Lu, Ke [1 ]
Zhang, Zhanchao [1 ]
机构
[1] Anhui Univ Technol, Sch Management Sci & Engn, Maanshan 243032, Peoples R China
基金
中国国家自然科学基金;
关键词
Missing data; Grey theory; Mutual information; Feature relevance; K nearest neighbours; FEATURE-SELECTION; ALGORITHM;
D O I
10.1007/s10489-015-0666-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Treatment of missing data has become increasingly significant in scientific research and engineering applications. The classic imputation strategy based on the K nearest neighbours (KNN) has been widely used to solve the plague problem. However, former studies do not give much attention to feature relevance, which has a significant impact on the selection of nearest neighbours. As a result, biased results may appear in similarity measurements. In this paper, we propose a novel method to impute missing data, named feature weighted grey KNN (FWGKNN) imputation algorithm. This approach employs mutual information (MI) to measure feature relevance. We present an experimental evaluation for five UCI datasets in three missingness mechanisms with various missing rates. Experimental results show that feature relevance has a non-ignorable influence on missing data estimation based on grey theory, and our method is considered superior to the other four estimation strategies. Moreover, the classification bias can be significantly reduced by using our approach in classification tasks.
引用
收藏
页码:614 / 632
页数:19
相关论文
共 50 条
  • [1] Missing data imputation by K nearest neighbours based on grey relational structure and mutual information
    Ruilin Pan
    Tingsheng Yang
    Jianhua Cao
    Ke Lu
    Zhanchao Zhang
    [J]. Applied Intelligence, 2015, 43 : 614 - 632
  • [2] K nearest neighbours with mutual information for simultaneous classification and missing data imputation
    Garcia-Laencina, Pedro J.
    Sancho-Gomez, Jose-Luis
    Figueiras-Vidal, Anibal R.
    Verleysen, Michel
    [J]. NEUROCOMPUTING, 2009, 72 (7-9) : 1483 - 1493
  • [3] Grey Relational Analysis based k Nearest Neighbor Missing Data Imputation for Software Quality Datasets
    Huang, Jianglin
    Sun, Hongyi
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY AND SECURITY (QRS 2016), 2016, : 86 - 91
  • [4] How distance metrics influence missing data imputation with k-nearest neighbours
    Santos, Miriam Seoane
    Abreu, Pedro Henriques
    Wilk, Szymon
    Santos, Joao
    [J]. PATTERN RECOGNITION LETTERS, 2020, 136 : 111 - 119
  • [5] Exploiting mutual information for the imputation of static and dynamic mixed-type clinical data with an adaptive k-nearest neighbours approach
    Erica Tavazzi
    Sebastian Daberdaku
    Rosario Vasta
    Andrea Calvo
    Adriano Chiò
    Barbara Di Camillo
    [J]. BMC Medical Informatics and Decision Making, 20
  • [6] Exploiting mutual information for the imputation of static and dynamic mixed-type clinical data with an adaptive k-nearest neighbours approach
    Tavazzi, Erica
    Daberdaku, Sebastian
    Vasta, Rosario
    Calvo, Andrea
    Chio, Adriano
    Di Camillo, Barbara
    [J]. BMC MEDICAL INFORMATICS AND DECISION MAKING, 2020, 20 (Suppl 5)
  • [7] K-Nearest Neighbor (K-NN) based Missing Data Imputation
    Murti, Della Murbarani Prawidya
    Wibawa, Aji Prasetya
    Akbar, Muhammad Iqbal
    Ianto, Utomo Puj
    [J]. 2019 5TH INTERNATIONAL CONFERENCE ON SCIENCE ININFORMATION TECHNOLOGY (ICSITECH): EMBRACING INDUSTRY 4.0 - TOWARDS INNOVATION IN CYBER PHYSICAL SYSTEM, 2019, : 83 - 88
  • [8] Nearest neighbours in least-squares data imputation algorithms with different missing patterns
    Wasito, I
    Mirkin, B
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2006, 50 (04) : 926 - 949
  • [9] Differentially Private k-Nearest Neighbor Missing Data Imputation
    Clifton, Chris
    Hanson, Eric J.
    Merrill, Keith
    Merrill, Shawn
    [J]. ACM TRANSACTIONS ON PRIVACY AND SECURITY, 2022, 25 (03)
  • [10] Interpolation and K-Nearest Neighbours Combined Imputation for Longitudinal ICU Laboratory Data
    Daberdaku, Sebastian
    Tavazzi, Erica
    Di Camillo, Barbara
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI), 2019, : 550 - 552