Estimating missing data using novel correlation maximization based methods

被引:16
|
作者
Sefidian, Amir Masoud [1 ]
Daneshpour, Negin [1 ]
机构
[1] Shahid Rajaee Teacher Training Univ, Fac Comp Engn, Tehran, Iran
关键词
Missing values; Imputation; Correlation; Regression; FUZZY C-MEANS; K-NEAREST NEIGHBORS; VALUE IMPUTATION; GENETIC ALGORITHM; VALUES; CLASSIFICATION; REGRESSION; FRAMEWORK; SELECTION; PATTERNS;
D O I
10.1016/j.asoc.2020.106249
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The accurate estimation of missing data plays a vital role in ensuring a high level of data quality. The missing values should be imputed before performing data mining, machine learning, and other data processing tasks. Ten correlation-based imputation methods are proposed in this paper. All of these methods try to maximize the correlation between a missing feature and other features. The maximization is achieved by selecting segments of data that have strong correlations. The proposed approach involves the following main steps to impute each missing instance. First, a base set is selected from complete instances. Second, data segments with strong correlations are generated using the base set and the rest of the complete instances. Finally, each missing value is imputed by applying linear models to the discovered segments of data. This study considers seven real datasets from different fields with different missing rates. The imputation quality of the proposed methods is compared to those of seven other imputation approaches in terms of three well-known evaluation criteria. The experimental results reveal that the proposed approach has better imputation performance than competing imputation techniques in most cases. (C) 2020 Elsevier B.V. All rights reserved.
引用
收藏
页数:30
相关论文
共 50 条
  • [41] Novel Methods for Imputing Missing Values in Water Level Monitoring Data
    Khampuengson, Thakolpat
    Wang, Wenjia
    WATER RESOURCES MANAGEMENT, 2023, 37 (02) : 851 - 878
  • [42] Novel Methods for Imputing Missing Values in Water Level Monitoring Data
    Thakolpat Khampuengson
    Wenjia Wang
    Water Resources Management, 2023, 37 : 851 - 878
  • [43] Missing data imputation using fuzzy-rough methods
    Amiri, Mehran
    Jensen, Richard
    NEUROCOMPUTING, 2016, 205 : 152 - 164
  • [44] Missing Data and Imputation Methods
    Schober, Patrick
    Vetter, Thomas R.
    ANESTHESIA AND ANALGESIA, 2020, 131 (05): : 1419 - 1420
  • [45] Estimating the position of mistracked coil of EMA Data using GMM-based methods
    Fang, Qiang
    Wei, Jianguo
    Hu, Fang
    Li, Aijun
    Wang, Haibo
    2013 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2013,
  • [46] Incremental general regression with expectation maximization for learning finite mixtures using data with missing values
    Abas, Ahmed R.
    WORLD CONGRESS ON COMPUTER & INFORMATION TECHNOLOGY (WCCIT 2013), 2013,
  • [47] Missing data imputation using machine learning based methods to improve HCC survival prediction
    Yumus, Mehmethan
    Apaydin, Merve
    Degirmenci, Ali
    Karal, Omer
    2020 28TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2020,
  • [48] Missing data: A comparison of neural network and expectation maximization techniques
    School of Electrical and Information Engineering, University of the Witwatersrand, Private Bag 3, Wits, 2050, South Africa
    Curr. Sci., 2007, 11 (1514-1521):
  • [49] Expectation-Maximization Approach to Fault Diagnosis With Missing Data
    Zhang, Kangkang
    Gonzalez, Ruben
    Huang, Biao
    Ji, Guoli
    IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2015, 62 (02) : 1231 - 1240
  • [50] Missing data: A comparison of neural network and expectation maximization techniques
    Nelwamondo, Fulufhelo V.
    Mohamed, Shakir
    Marwala, Tshilidzi
    CURRENT SCIENCE, 2007, 93 (11): : 1514 - 1521