Estimating missing data using novel correlation maximization based methods

被引:16
|
作者
Sefidian, Amir Masoud [1 ]
Daneshpour, Negin [1 ]
机构
[1] Shahid Rajaee Teacher Training Univ, Fac Comp Engn, Tehran, Iran
关键词
Missing values; Imputation; Correlation; Regression; FUZZY C-MEANS; K-NEAREST NEIGHBORS; VALUE IMPUTATION; GENETIC ALGORITHM; VALUES; CLASSIFICATION; REGRESSION; FRAMEWORK; SELECTION; PATTERNS;
D O I
10.1016/j.asoc.2020.106249
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The accurate estimation of missing data plays a vital role in ensuring a high level of data quality. The missing values should be imputed before performing data mining, machine learning, and other data processing tasks. Ten correlation-based imputation methods are proposed in this paper. All of these methods try to maximize the correlation between a missing feature and other features. The maximization is achieved by selecting segments of data that have strong correlations. The proposed approach involves the following main steps to impute each missing instance. First, a base set is selected from complete instances. Second, data segments with strong correlations are generated using the base set and the rest of the complete instances. Finally, each missing value is imputed by applying linear models to the discovered segments of data. This study considers seven real datasets from different fields with different missing rates. The imputation quality of the proposed methods is compared to those of seven other imputation approaches in terms of three well-known evaluation criteria. The experimental results reveal that the proposed approach has better imputation performance than competing imputation techniques in most cases. (C) 2020 Elsevier B.V. All rights reserved.
引用
收藏
页数:30
相关论文
共 50 条
  • [21] Regression analysis with missing covariate data using estimating equations
    Zhao, L. P.
    Lipsitz, S.
    Lew, D.
    Biometrics, 52 (04):
  • [22] Comparison of missing data imputation methods using weather data
    Nida, Hafiza
    Kashif, Muhammad
    Khan, Muhammad Imran
    Ghamkhar, Madiha
    PAKISTAN JOURNAL OF AGRICULTURAL SCIENCES, 2023, 60 (02): : 327 - 336
  • [23] Addressing the Curse of Missing Data in Clinical Contexts: A Novel Approach to Correlation-based Imputation
    Curioso, Isabel
    Santos, Ricardo
    Ribeiro, Bruno
    Carreiro, Andre
    Coelho, Pedro
    Fragata, Jose
    Gamboa, Hugo
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2023, 35 (06)
  • [24] New Chain Imputation Methods for Estimating Population Mean in the Presence of Missing Data Using Two Auxiliary Variables
    Bhushan, Shashi
    Pandey, Abhay Pratap
    COMMUNICATIONS IN MATHEMATICS AND STATISTICS, 2023, 11 (02) : 325 - 340
  • [25] New Chain Imputation Methods for Estimating Population Mean in the Presence of Missing Data Using Two Auxiliary Variables
    Shashi Bhushan
    Abhay Pratap Pandey
    Communications in Mathematics and Statistics, 2023, 11 : 325 - 340
  • [26] Impact of Missing Data on Correlation Coefficient Values: Deletion and Imputation Methods for Data Preparation
    Shantal, Mohammed
    Othman, Zalinda
    Abu Bakar, Azuraliza
    MALAYSIAN JOURNAL OF FUNDAMENTAL AND APPLIED SCIENCES, 2023, 19 (06): : 1052 - 1067
  • [27] The Effects of Model Based Missing Data Methods on Guessing Parameter in Case of Ignorable Missing Data
    Kocak, Duygu
    PEGEM EGITIM VE OGRETIM DERGISI, 2018, 8 (01): : 155 - 171
  • [28] Regression in the presence missing data using ensemble methods
    Hassan, Mostafa M.
    Atiya, Amir F.
    El-Gayar, Neamat
    El-Fouly, Raafat
    2007 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-6, 2007, : 1261 - +
  • [29] The restoration of missing data using Bayesian numerical methods
    Fitzgerald, WJ
    ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 2055 - 2055
  • [30] New imputation methods for missing data using quantiles
    Munoz, J. F.
    Rueda, M.
    JOURNAL OF COMPUTATIONAL AND APPLIED MATHEMATICS, 2009, 232 (02) : 305 - 317