Nearest neighbours in least-squares data imputation algorithms with different missing patterns

被引:21
|
作者
Wasito, I
Mirkin, B
机构
[1] Univ London Birkbeck Coll, Sch Comp Sci & Informat Syst, London WC1E 7HX, England
[2] IIUM, Fac Engn, Dept Elect & Comp Engn, Kuala Lumpur 53100, Malaysia
关键词
least squares; nearest neighbours; singular value decomposition; missing data; random missing; restricted random missing; merged database missing;
D O I
10.1016/j.csda.2004.11.009
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Methods for imputation of missing data in the so-called least-squares approximation approach, a non-parametric computationally efficient multidimensional technique, are experimentally compared. Contributions are made to each of the three components of the experiment setting: (a) algorithms to be compared, (b) data generation, and (c) patterns of missing data. Specifically, "global" methods for least-squares data imputation are reviewed and extensions to them are proposed based on the nearest neighbours (NN) approach. A conventional generator of mixtures of Gaussian distributions is theoretically analysed and, then, modified to scale clusters differently. Patterns of missing data are defined in terms of rows and columns according to three different mechanisms that are referred to as Random missings, Restricted random missings, and Merged database. It appears that NN-based versions almost always outperform their global counterparts. With the Random missings pattern, the winner is always the authors' two-stage method M, which combines global and local imputation algorithms. (c) 2004 Elsevier B.V. All rights reserved.
引用
收藏
页码:926 / 949
页数:24
相关论文
共 50 条
  • [1] Nearest neighbour approach in the least-squares data imputation algorithms
    Wasito, I
    Mirkin, B
    [J]. INFORMATION SCIENCES, 2005, 169 (1-2) : 1 - 25
  • [2] K nearest neighbours with mutual information for simultaneous classification and missing data imputation
    Garcia-Laencina, Pedro J.
    Sancho-Gomez, Jose-Luis
    Figueiras-Vidal, Anibal R.
    Verleysen, Michel
    [J]. NEUROCOMPUTING, 2009, 72 (7-9) : 1483 - 1493
  • [3] LEAST-SQUARES ALGORITHMS
    WAMPLER, RH
    [J]. AMERICAN STATISTICIAN, 1977, 31 (01): : 52 - 53
  • [4] How distance metrics influence missing data imputation with k-nearest neighbours
    Santos, Miriam Seoane
    Abreu, Pedro Henriques
    Wilk, Szymon
    Santos, Joao
    [J]. PATTERN RECOGNITION LETTERS, 2020, 136 : 111 - 119
  • [5] Data-Reuse Recursive Least-Squares Algorithms
    Paleologu, Constantin
    Benesty, Jacob
    Ciochina, Silviu
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 752 - 756
  • [6] TEST DATA FOR STATISTICAL ALGORITHMS - LEAST-SQUARES AND ANOVA
    HASTINGS, WK
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1972, 67 (340) : 874 - 879
  • [7] FAST LEAST-SQUARES ALGORITHMS
    DAVIDON, WC
    [J]. AMERICAN JOURNAL OF PHYSICS, 1977, 45 (03) : 260 - 262
  • [8] Estimation of missing data in analysis of covariance: A least-squares approach
    Ogbonnaya, Chibueze E.
    Uzochukwu, Emeka C.
    [J]. COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2016, 45 (07) : 1902 - 1909
  • [9] Least-squares parameter estimation for systems with irregularly missing data
    Ding, Feng
    Ding, Jie
    [J]. INTERNATIONAL JOURNAL OF ADAPTIVE CONTROL AND SIGNAL PROCESSING, 2010, 24 (07) : 540 - 553
  • [10] Missing data imputation by K nearest neighbours based on grey relational structure and mutual information
    Pan, Ruilin
    Yang, Tingsheng
    Cao, Jianhua
    Lu, Ke
    Zhang, Zhanchao
    [J]. APPLIED INTELLIGENCE, 2015, 43 (03) : 614 - 632