Estimating missing data using novel correlation maximization based methods

被引:16
|
作者
Sefidian, Amir Masoud [1 ]
Daneshpour, Negin [1 ]
机构
[1] Shahid Rajaee Teacher Training Univ, Fac Comp Engn, Tehran, Iran
关键词
Missing values; Imputation; Correlation; Regression; FUZZY C-MEANS; K-NEAREST NEIGHBORS; VALUE IMPUTATION; GENETIC ALGORITHM; VALUES; CLASSIFICATION; REGRESSION; FRAMEWORK; SELECTION; PATTERNS;
D O I
10.1016/j.asoc.2020.106249
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The accurate estimation of missing data plays a vital role in ensuring a high level of data quality. The missing values should be imputed before performing data mining, machine learning, and other data processing tasks. Ten correlation-based imputation methods are proposed in this paper. All of these methods try to maximize the correlation between a missing feature and other features. The maximization is achieved by selecting segments of data that have strong correlations. The proposed approach involves the following main steps to impute each missing instance. First, a base set is selected from complete instances. Second, data segments with strong correlations are generated using the base set and the rest of the complete instances. Finally, each missing value is imputed by applying linear models to the discovered segments of data. This study considers seven real datasets from different fields with different missing rates. The imputation quality of the proposed methods is compared to those of seven other imputation approaches in terms of three well-known evaluation criteria. The experimental results reveal that the proposed approach has better imputation performance than competing imputation techniques in most cases. (C) 2020 Elsevier B.V. All rights reserved.
引用
收藏
页数:30
相关论文
共 50 条
  • [1] Evaluation of the principal-component and expectation-maximization methods for estimating missing data in morphometric studies
    Strauss, RE
    Atanassov, MN
    De Oliveira, JA
    JOURNAL OF VERTEBRATE PALEONTOLOGY, 2003, 23 (02) : 284 - 296
  • [2] Imputation methods of missing data for estimating the population mean using simple random sampling with known correlation coefficient
    Amer Ibrahim Al-Omari
    Carlos N. Bouza
    Carmelo Herrera
    Quality & Quantity, 2013, 47 : 353 - 365
  • [3] Imputation methods of missing data for estimating the population mean using simple random sampling with known correlation coefficient
    Al-Omari, Amer Ibrahim
    Bouza, Carlos N.
    Herrera, Carmelo
    QUALITY & QUANTITY, 2013, 47 (01) : 353 - 365
  • [4] Rank correlation methods for missing data
    Alvo, M
    Cabilio, P
    CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 1995, 23 (04): : 345 - 358
  • [5] Estimating missing reference evapotranspiration data by correlation analysis
    Eching, SO
    PROCEEDINGS OF THE IVTH INTERNATIONAL SYMPOSIUM ON IRRIGATION OF HORTICULTURAL CROPS, 2004, (664): : 181 - 187
  • [6] A Comparison of Methods of Estimating Missing Daily Rainfall Data
    Caldera, H. P. G. M.
    Piyathisse, V. R. P. C.
    Nandalal, K. D. W.
    ENGINEER-JOURNAL OF THE INSTITUTION OF ENGINEERS SRI LANKA, 2016, 49 (04): : 1 - 8
  • [7] Estimating the Values of Missing Data Related to Infrastructure Condition States Using Their Spatial Correlation
    Moghtadernejad, Saviz
    Jin, Yuchuan
    Adey, Bryan Tyrone
    JOURNAL OF INFRASTRUCTURE SYSTEMS, 2023, 29 (01)
  • [8] Method for Estimating SAR Ground-Moving Target Parameters With Azimuth Missing Data Based on Contrast Maximization
    Du, Huagui
    Song, Yongping
    Jiang, Nan
    Wang, Jian
    Fan, Chongyi
    Huang, Xiaotao
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 17
  • [9] Comparison methods of estimating missing data in real data time series
    Tasho, Eljona Milo
    Zeqo, Lorena Margo
    ASIAN-EUROPEAN JOURNAL OF MATHEMATICS, 2022, 15 (10)
  • [10] Prediction of missing data in cardiotocograms using the expectation maximization algorithm
    Nokas, G
    Koutras, A
    Christoyannis, I
    Georgoulas, G
    Stylios, C
    Groumpos, P
    SCATTERING AND BIOMEDICAL ENGINEERING: MODELING AND APPLICATIONS, 2002, : 354 - 362