Missing data imputation by K nearest neighbours based on grey relational structure and mutual information

被引:97
|
作者
Pan, Ruilin [1 ]
Yang, Tingsheng [1 ]
Cao, Jianhua [1 ]
Lu, Ke [1 ]
Zhang, Zhanchao [1 ]
机构
[1] Anhui Univ Technol, Sch Management Sci & Engn, Maanshan 243032, Peoples R China
基金
中国国家自然科学基金;
关键词
Missing data; Grey theory; Mutual information; Feature relevance; K nearest neighbours; FEATURE-SELECTION; ALGORITHM;
D O I
10.1007/s10489-015-0666-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Treatment of missing data has become increasingly significant in scientific research and engineering applications. The classic imputation strategy based on the K nearest neighbours (KNN) has been widely used to solve the plague problem. However, former studies do not give much attention to feature relevance, which has a significant impact on the selection of nearest neighbours. As a result, biased results may appear in similarity measurements. In this paper, we propose a novel method to impute missing data, named feature weighted grey KNN (FWGKNN) imputation algorithm. This approach employs mutual information (MI) to measure feature relevance. We present an experimental evaluation for five UCI datasets in three missingness mechanisms with various missing rates. Experimental results show that feature relevance has a non-ignorable influence on missing data estimation based on grey theory, and our method is considered superior to the other four estimation strategies. Moreover, the classification bias can be significantly reduced by using our approach in classification tasks.
引用
收藏
页码:614 / 632
页数:19
相关论文
共 50 条
  • [21] Improved methods for the imputation of missing data by nearest neighbor methods
    Tutz, Gerhard
    Ramzan, Shahla
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2015, 90 : 84 - 99
  • [22] Assessing the Impact of Distance Functions on K-Nearest Neighbours Imputation of Biomedical Datasets
    Santos, Miriam S.
    Abreu, Pedro H.
    Wilk, Szymon
    Santos, Joao
    ARTIFICIAL INTELLIGENCE IN MEDICINE (AIME 2020), 2020, : 486 - 496
  • [23] Remote Sensing Image Registration Based on Mutual Information and Grey Relational Analysis
    Wen Hong-yan
    Gao Jing-tao
    2009 IEEE INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND INTELLIGENT SYSTEMS, PROCEEDINGS, VOL 4, 2009, : 127 - +
  • [24] On the Use of Weighted k-Nearest Neighbors for Missing Value Imputation
    Lim, Chanhui
    Kim, Dongjae
    KOREAN JOURNAL OF APPLIED STATISTICS, 2015, 28 (01) : 23 - 31
  • [25] A Hierarchical Missing Value Imputation Method by Correlation-Based K-Nearest Neighbors
    Liu, Xin
    Lai, Xiaochen
    Zhang, Liyong
    INTELLIGENT SYSTEMS AND APPLICATIONS, VOL 1, 2020, 1037 : 486 - 496
  • [26] Imputation of missing values in well log data using k-nearest neighbor collaborative filtering
    Kim, Min Jun
    Cho, Yongchae
    COMPUTERS & GEOSCIENCES, 2024, 193
  • [27] MissII: Missing Information Imputation for Traffic Data
    Hou, Mingliang
    Tang, Tao
    Xia, Feng
    Sultan, Ibrahim
    Kaur, Roopdeep
    Kong, Xiangjie
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, 2024, 12 (03) : 752 - 765
  • [28] Imputation of missing information in worldwide patent data
    de Rassenfosse, Gaetan
    Seliger, Florian
    DATA IN BRIEF, 2021, 34
  • [29] Single imputation with multilayer perceptron and multiple imputation combining multilayer perceptron and k-nearest neighbours for monotone patterns
    Silva-Ramireza, Esther-Lydia
    Pino-Mejias, Rafael
    Lopez-Coello, Manuel
    APPLIED SOFT COMPUTING, 2015, 29 : 65 - 74
  • [30] Optimal location query based on k nearest neighbours
    Yubao LIU
    Zitong CHEN
    Ada WaiChee FU
    Raymond ChiWing WONG
    Genan DAI
    Frontiers of Computer Science, 2021, (02) : 101 - 113