On regression and classification with possibly missing response variables in the data

被引:0
|
作者
Mojirsheibani, Majid [1 ]
Pouliot, William [2 ]
Shakhbandaryan, Andre [1 ]
机构
[1] Calif State Univ Northridge, Dept Math, Northridge, CA 91330 USA
[2] Univ Birmingham, Dept Econ, Birmingham, England
基金
美国国家科学基金会;
关键词
Regression; Partially observed data; Kernel; Convergence; Classification; Margin condition; LINEAR-REGRESSION; CONVERGENCE; MARGIN; MODELS;
D O I
10.1007/s00184-023-00923-3
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
This paper considers the problem of kernel regression and classification with possibly unobservable response variables in the data, where the mechanism that causes the absence of information can depend on both predictors and the response variables. Our proposed approach involves two steps: First we construct a family of models (possibly infinite dimensional) indexed by the unknown parameter of the missing probability mechanism. In the second step, a search is carried out to find the empirically optimal member of an appropriate cover (or subclass) of the underlying family in the sense of minimizing the mean squared prediction error. The main focus of the paper is to look into some of the theoretical properties of these estimators. The issue of identifiability is also addressed. Our methods use a data-splitting approach which is quite easy to implement. We also derive exponential bounds on the performance of the resulting Destimators in terms of their deviations from the true regression curve in general L-p norms, where we allow the size of the cover or subclass to diverge as the sample size n increases. These bounds immediately yield various strong convergence results for the proposed estimators. As an application of our findings, we consider the problem of statistical classification based on the proposed regression estimators and also look into their rates of convergence under different settings. Although this work is mainly stated for kernel-type estimators, it can also be extended to other popular local-averaging methods such as nearest-neighbor and histogram estimators.
引用
收藏
页码:607 / 648
页数:42
相关论文
共 50 条
  • [1] Estimating the density of a possibly missing response variable in nonlinear regression
    Mueller, Ursula U.
    [J]. JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2012, 142 (05) : 1198 - 1214
  • [2] Missing data imputation using classification and regression trees
    Chen, Cheng-Yang
    Chang, Yu-Wei
    [J]. PEERJ COMPUTER SCIENCE, 2024, 10
  • [4] Semiparametric Analysis of Isotonic Errors-in-Variables Regression Models with Missing Response
    Sun, Zhimeng
    Zhang, Zhongzhan
    Du, Jiang
    [J]. COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2012, 41 (11) : 2034 - 2060
  • [5] Spatio-Temporal Instrumental Variables Regression with Missing Data: A Bayesian Approach
    Nascimento, Marcus L.
    Goncalves, Kelly C. M.
    Mendonca, Mario Jorge
    [J]. COMPUTATIONAL ECONOMICS, 2023, 62 (01) : 29 - 47
  • [6] Spatio-Temporal Instrumental Variables Regression with Missing Data: A Bayesian Approach
    Marcus L. Nascimento
    Kelly C. M. Gonçalves
    Mario Jorge Mendonça
    [J]. Computational Economics, 2023, 62 : 29 - 47
  • [7] Missing Data Imputation for Continuous Variables Based on Multivariate Adaptive Regression Splines
    Sanchez Lasheras, Fernando
    Garcia Nieto, Paulino Jose
    Garcia-Gonzalo, Esperanza
    Argueso Gomez, Francisco
    Rodriguez Iglesias, Francisco Javier
    Suarez Sanchez, Ana
    Santos Rodriguez, Jesus Daniel
    Luisa Sanchez, Maria
    Gonzalez-Nuevo, Joaquin
    Bonavera, Laura
    Toffolatti, Luigi
    Fernandez Menendez, Susana del Carmen
    de Cos Juez, Francisco Javier
    [J]. HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, HAIS 2020, 2020, 12344 : 73 - 85
  • [8] High-dimensional variable selection in regression and classification with missing data
    Gao, Qi
    Lee, Thomas C. M.
    [J]. SIGNAL PROCESSING, 2017, 131 : 1 - 7
  • [9] Logistic regression analysis of randomized response data with missing covariates
    Hsieh, S. H.
    Lee, S. M.
    Shen, P. S.
    [J]. JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2010, 140 (04) : 927 - 940
  • [10] Nonparametric -type regression estimation under missing response data
    Luo, Shuanghua
    Zhang, Cheng-yi
    [J]. STATISTICAL PAPERS, 2016, 57 (03) : 641 - 664