Missing Value Imputation via Clusterwise Linear Regression

被引:28
|
作者
Karmitsa, Napsu [1 ]
Taheri, Sona [2 ]
Bagirov, Adil [2 ]
Makinen, Pauliina [1 ]
机构
[1] Univ Turku, Dept Math & Stat, FI-20014 Turku, Finland
[2] Federat Univ Australia, Sch Sci Engn & Informat Technol, Mt Helen, Vic 3350, Australia
基金
芬兰科学院;
关键词
TV; Data analysis; incomplete data; imputation; clusterwise linear regression; nonsmooth optimization; MEMORY BUNDLE METHOD; OPTIMIZATION; CLASSIFICATION; METHODOLOGY; ALGORITHM;
D O I
10.1109/TKDE.2020.3001694
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper a new method of preprocessing incomplete data is introduced. The method is based on clusterwise linear regression and it combines two well-known approaches for missing value imputation: linear regression and clustering. The idea is to approximate missing values using only those data points that are somewhat similar to the incomplete data point. A similar idea is used also in clustering based imputation methods. Nevertheless, here the linear regression approach is used within each cluster to accurately predict the missing values, and this is done simultaneously to clustering. The proposed method is tested using some synthetic and real-world data sets and compared with other algorithms for missing value imputations. Numerical results demonstrate that this method produces the most accurate imputations in MCAR and MAR data sets with a clear structure and the percentages of missing data no more than 25 percent.
引用
收藏
页码:1889 / 1901
页数:13
相关论文
共 50 条
  • [1] Constrained clusterwise linear regression
    Plaia, A
    [J]. New Developments in Classification and Data Analysis, 2005, : 79 - 86
  • [2] CLUSTERWISE LINEAR-REGRESSION
    SPATH, H
    [J]. COMPUTING, 1979, 22 (04) : 367 - 373
  • [3] Clusterwise functional linear regression models
    Li, Ting
    Song, Xinyuan
    Zhang, Yingying
    Zhu, Hongtu
    Zhu, Zhongyi
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2021, 158
  • [4] Seemingly unrelated clusterwise linear regression
    Galimberti, Giuliano
    Soffritti, Gabriele
    [J]. ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2020, 14 (02) : 235 - 260
  • [5] Regression imputation in the functional linear model with missing values in the response
    Crambes, Christophe
    Henchiri, Yousri
    [J]. JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2019, 201 : 103 - 119
  • [6] Imputation and variable selection in linear regression models with missing covariates
    Yang, XW
    Belin, TR
    Boscardin, WJ
    [J]. BIOMETRICS, 2005, 61 (02) : 498 - 506
  • [7] Algorithms for Generalized Clusterwise Linear Regression
    Park, Young Woong
    Jiang, Yan
    Klabjan, Diego
    Williams, Loren
    [J]. INFORMS JOURNAL ON COMPUTING, 2017, 29 (02) : 301 - 317
  • [8] Identifiability of models for clusterwise linear regression
    Hennig, C
    [J]. JOURNAL OF CLASSIFICATION, 2000, 17 (02) : 273 - 296
  • [9] Models and methods for clusterwise linear regression
    Hennig, C
    [J]. CLASSIFICATION IN THE INFORMATION AGE, 1999, : 179 - 187
  • [10] Seemingly unrelated clusterwise linear regression
    Giuliano Galimberti
    Gabriele Soffritti
    [J]. Advances in Data Analysis and Classification, 2020, 14 : 235 - 260