Missing data imputation by K nearest neighbours based on grey relational structure and mutual information

被引:97
|
作者
Pan, Ruilin [1 ]
Yang, Tingsheng [1 ]
Cao, Jianhua [1 ]
Lu, Ke [1 ]
Zhang, Zhanchao [1 ]
机构
[1] Anhui Univ Technol, Sch Management Sci & Engn, Maanshan 243032, Peoples R China
基金
中国国家自然科学基金;
关键词
Missing data; Grey theory; Mutual information; Feature relevance; K nearest neighbours; FEATURE-SELECTION; ALGORITHM;
D O I
10.1007/s10489-015-0666-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Treatment of missing data has become increasingly significant in scientific research and engineering applications. The classic imputation strategy based on the K nearest neighbours (KNN) has been widely used to solve the plague problem. However, former studies do not give much attention to feature relevance, which has a significant impact on the selection of nearest neighbours. As a result, biased results may appear in similarity measurements. In this paper, we propose a novel method to impute missing data, named feature weighted grey KNN (FWGKNN) imputation algorithm. This approach employs mutual information (MI) to measure feature relevance. We present an experimental evaluation for five UCI datasets in three missingness mechanisms with various missing rates. Experimental results show that feature relevance has a non-ignorable influence on missing data estimation based on grey theory, and our method is considered superior to the other four estimation strategies. Moreover, the classification bias can be significantly reduced by using our approach in classification tasks.
引用
收藏
页码:614 / 632
页数:19
相关论文
共 50 条
  • [31] Optimal location query based on k nearest neighbours
    Liu, Yubao
    Chen, Zitong
    Fu, Ada Wai-Chee
    Wong, Raymond Chi-Wing
    Dai, Genan
    FRONTIERS OF COMPUTER SCIENCE, 2021, 15 (02)
  • [32] Optimal location query based on k nearest neighbours
    Yubao Liu
    Zitong Chen
    Ada Wai-Chee Fu
    Raymond Chi-Wing Wong
    Genan Dai
    Frontiers of Computer Science, 2021, 15
  • [33] Missing value imputation for gene expression data by tailored nearest neighbors
    Faisal, Shahla
    Tutz, Gerhard
    STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY, 2017, 16 (02) : 95 - 106
  • [34] An adaptive mutual K-nearest neighbors clustering algorithm based on maximizing mutual information
    Wang, Yizhang
    Pang, Wei
    Jiao, Zhixiang
    PATTERN RECOGNITION, 2023, 137
  • [35] Optimal -k nearest neighbours based ensemble for classification and feature selection in chemometrics data
    ul Haq, Inzamam
    Khan, Dost Muhammad
    Hamraz, Muhammad
    Iqbal, Nadeem
    Ali, Amjad
    Khan, Zardad
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2023, 240
  • [36] Temporal and Spatial Nearest Neighbor Values Based Missing Data Imputation in Wireless Sensor Networks
    Deng, Yulong
    Han, Chong
    Guo, Jian
    Sun, Lijuan
    SENSORS, 2021, 21 (05) : 1 - 24
  • [37] Optimized fuzzy clustering-based k-nearest neighbors imputation for mixed missing data in software development effort estimation
    Abnane, Ibtissam
    Idri, Ali
    Abran, Alain
    JOURNAL OF SOFTWARE-EVOLUTION AND PROCESS, 2024, 36 (04)
  • [38] Infilling Missing Rainfall and Runoff Data for Sarawak, Malaysia Using Gaussian Mixture Model Based K-Nearest Neighbor Imputation
    Chiu, Po Chan
    Selamat, Ali
    Krejcar, Ondrej
    ADVANCES AND TRENDS IN ARTIFICIAL INTELLIGENCE: FROM THEORY TO PRACTICE, 2019, 11606 : 27 - 38
  • [39] From Big data to Smart Data with the K-Nearest Neighbours algorithm
    Triguero, Isaac
    Maillo, Jesus
    Luengo, Julian
    Garcia, Salvador
    Herrera, Francisco
    2016 IEEE INTERNATIONAL CONFERENCE ON INTERNET OF THINGS (ITHINGS) AND IEEE GREEN COMPUTING AND COMMUNICATIONS (GREENCOM) AND IEEE CYBER, PHYSICAL AND SOCIAL COMPUTING (CPSCOM) AND IEEE SMART DATA (SMARTDATA), 2016, : 859 - 864
  • [40] A similarity measure based on causal neighbours and mutual information
    Chua, JJ
    Tischer, PE
    DESIGN AND APPLICATION OF HYBRID INTELLIGENT SYSTEMS, 2003, 104 : 842 - 851