Estimating missing data using novel correlation maximization based methods

被引:16
|
作者
Sefidian, Amir Masoud [1 ]
Daneshpour, Negin [1 ]
机构
[1] Shahid Rajaee Teacher Training Univ, Fac Comp Engn, Tehran, Iran
关键词
Missing values; Imputation; Correlation; Regression; FUZZY C-MEANS; K-NEAREST NEIGHBORS; VALUE IMPUTATION; GENETIC ALGORITHM; VALUES; CLASSIFICATION; REGRESSION; FRAMEWORK; SELECTION; PATTERNS;
D O I
10.1016/j.asoc.2020.106249
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The accurate estimation of missing data plays a vital role in ensuring a high level of data quality. The missing values should be imputed before performing data mining, machine learning, and other data processing tasks. Ten correlation-based imputation methods are proposed in this paper. All of these methods try to maximize the correlation between a missing feature and other features. The maximization is achieved by selecting segments of data that have strong correlations. The proposed approach involves the following main steps to impute each missing instance. First, a base set is selected from complete instances. Second, data segments with strong correlations are generated using the base set and the rest of the complete instances. Finally, each missing value is imputed by applying linear models to the discovered segments of data. This study considers seven real datasets from different fields with different missing rates. The imputation quality of the proposed methods is compared to those of seven other imputation approaches in terms of three well-known evaluation criteria. The experimental results reveal that the proposed approach has better imputation performance than competing imputation techniques in most cases. (C) 2020 Elsevier B.V. All rights reserved.
引用
收藏
页数:30
相关论文
共 50 条
  • [31] Estimating Equations Inference With Missing Data
    Zhou, Yong
    Wan, Alan T. K.
    Wang, Xiaojing
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2008, 103 (483) : 1187 - 1199
  • [32] Nonparametric quantile regression with missing data using local estimating equations
    Wang, Chunyu
    Tian, Maozai
    Tang, Man-Lai
    JOURNAL OF NONPARAMETRIC STATISTICS, 2022, 34 (01) : 164 - 186
  • [33] Correlation when data are missing
    Parzen, M.
    Lipsitz, S.
    Metters, R.
    Fitzmaurice, G.
    JOURNAL OF THE OPERATIONAL RESEARCH SOCIETY, 2010, 61 (06) : 1049 - 1056
  • [34] Estimating missing weather data for agricultural simulations using group method of data handling
    Acock, MC
    Pachepsky, YA
    JOURNAL OF APPLIED METEOROLOGY, 2000, 39 (07): : 1176 - 1184
  • [35] Handling estimating equation with nonignorably missing data based on SIR algorithm
    Wang, Xiuli
    Song, Yunquan
    Lin, Lu
    JOURNAL OF COMPUTATIONAL AND APPLIED MATHEMATICS, 2017, 326 : 62 - 70
  • [36] Estimating spatial correlation structures based on CPT data
    Liu, Chia-Nan
    Chen, Chien-Hsun
    GEORISK-ASSESSMENT AND MANAGEMENT OF RISK FOR ENGINEERED SYSTEMS AND GEOHAZARDS, 2010, 4 (02) : 99 - 108
  • [37] Estimating Missing Unit Process Data in Life Cycle Assessment Using a Similarity-Based Approach
    Hou, Ping
    Cai, Jiarui
    Qu, Shen
    Xu, Ming
    ENVIRONMENTAL SCIENCE & TECHNOLOGY, 2018, 52 (09) : 5259 - 5267
  • [38] Estimating treatment effects from longitudinal clinical trial data with missing values: comparative analyses using different methods
    Houck, PR
    Mazuradar, S
    Koru-Sengul, T
    Tang, G
    Mulsant, BH
    Pollock, BG
    Reynolds, CF
    PSYCHIATRY RESEARCH, 2004, 129 (02) : 209 - 215
  • [39] Imputing the missing data in IoT based on the spatial and temporal correlation
    Mary, I. Priya Stella
    Arockiam, L.
    2017 IEEE INTERNATIONAL CONFERENCE ON CURRENT TRENDS IN ADVANCED COMPUTING (ICCTAC), 2017,
  • [40] Estimating Missing Data and Determining the Confidence of the Estimate Data
    Mistry, Jaisheel
    Nelwamondo, Fulufhelo
    Marwala, Tshlidzi
    SEVENTH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, PROCEEDINGS, 2008, : 752 - 755