Missing data imputation by K nearest neighbours based on grey relational structure and mutual information

被引:97
|
作者
Pan, Ruilin [1 ]
Yang, Tingsheng [1 ]
Cao, Jianhua [1 ]
Lu, Ke [1 ]
Zhang, Zhanchao [1 ]
机构
[1] Anhui Univ Technol, Sch Management Sci & Engn, Maanshan 243032, Peoples R China
基金
中国国家自然科学基金;
关键词
Missing data; Grey theory; Mutual information; Feature relevance; K nearest neighbours; FEATURE-SELECTION; ALGORITHM;
D O I
10.1007/s10489-015-0666-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Treatment of missing data has become increasingly significant in scientific research and engineering applications. The classic imputation strategy based on the K nearest neighbours (KNN) has been widely used to solve the plague problem. However, former studies do not give much attention to feature relevance, which has a significant impact on the selection of nearest neighbours. As a result, biased results may appear in similarity measurements. In this paper, we propose a novel method to impute missing data, named feature weighted grey KNN (FWGKNN) imputation algorithm. This approach employs mutual information (MI) to measure feature relevance. We present an experimental evaluation for five UCI datasets in three missingness mechanisms with various missing rates. Experimental results show that feature relevance has a non-ignorable influence on missing data estimation based on grey theory, and our method is considered superior to the other four estimation strategies. Moreover, the classification bias can be significantly reduced by using our approach in classification tasks.
引用
收藏
页码:614 / 632
页数:19
相关论文
共 50 条
  • [41] Imputation Method of Random Arbitrary Missing Data Based on Improved Close Degree of Grey Incidence
    Liu, Guodong
    Zhu, Jianjun
    Liu, Xiaodi
    JOURNAL OF GREY SYSTEM, 2019, 31 (02): : 74 - 97
  • [42] Extended k-nearest neighbours based on evidence theory
    Wang, H
    Bell, D
    COMPUTER JOURNAL, 2004, 47 (06): : 662 - 672
  • [43] Information granule-based classifier: A development of granular imputation of missing data
    Hu, Xingchen
    Pedrycz, Witold
    Wu, Keyu
    Shen, Yinghua
    KNOWLEDGE-BASED SYSTEMS, 2021, 214
  • [44] A K-nearest neighbours method based on imprecise probabilities
    Destercke, Sebastien
    SOFT COMPUTING, 2012, 16 (05) : 833 - 844
  • [45] A K-nearest neighbours method based on imprecise probabilities
    Sebastien Destercke
    Soft Computing, 2012, 16 : 833 - 844
  • [46] Fuzzy clustering based on K-nearest-neighbours rule
    Zahid, N
    Abouelala, O
    Limouri, M
    Essaid, A
    FUZZY SETS AND SYSTEMS, 2001, 120 (02) : 239 - 247
  • [47] An Occupancy Mapping Method Based on K-Nearest Neighbours
    Miao, Yu
    Hunter, Alan
    Georgilas, Ioannis
    SENSORS, 2022, 22 (01)
  • [48] Imputation of Missing Data in Materials Science through Nearest Neighbors and Iterative Predictions
    Xie, Chunhui
    Li, Rui
    Li, Yunqi
    Xie, Haibo
    Liu, Qibin
    JOURNAL OF CHEMICAL THEORY AND COMPUTATION, 2024, 21 (01) : 70 - 78
  • [49] Missing data imputation by nearest-neighbor trained BP for fuzzy clustering
    Zhang, Li, 1600, Binary Information Press (11):
  • [50] Fuzzy clustering based on K-nearest-neighbours rule
    Zahid, N.
    Abouelala, O.
    Limouri, M.
    Essaid, A.
    2001, Elsevier (120)