Simultaneous Semiparametric Estimation of Clustering and Regression

被引:0
|
作者
Marbac, Matthieu [1 ]
Sedki, Mohammed [2 ,3 ]
Biernacki, Christophe [4 ]
Vandewalle, Vincent [5 ]
机构
[1] Univ Rennes, CNRS, ENSAI, CREST UMR 9194, Rennes, France
[2] Univ Paris Saclay, Gif Sur Yvette, France
[3] INSERM, Paris, France
[4] Univ Lille, UMR Lab Paul Painleve, CNRS, INRIA, Lille, France
[5] Univ Lille, ULR 2694 METRICS Evaluat Technol Sante & Prat Med, CHU Lille, INRIA, Lille, France
关键词
Clustering; Finite mixture; Regression model; Semiparametric model; VARIABLE SELECTION; MODELS; MIXTURES; COMPONENTS; NUMBER;
D O I
10.1080/10618600.2021.2000872
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We investigate the parameter estimation of regression models with fixed group effects, when the group variable is missing while group-related variables are available. This problem involves clustering to infer the missing group variable based on the group-related variables, and regression to build a model on the target variable given the group and eventually some additional variables. Thus, this problem can be formulated as the joint distribution modeling of the target and of the group-related variables. The usual parameter estimation strategy for this joint model is a two-step approach starting by learning the group variable (clustering step) and then plugging in its estimator for fitting the regression model (regression step). However, this approach is suboptimal (providing in particular biased regression estimates) since it does not make use of the target variable for clustering. Thus, we advise the use of a simultaneous estimation approach of both clustering and regression, in a semiparametric framework. Numerical experiments illustrate the benefits of our proposition by considering wide ranges of distributions and regression models. The relevance of our new method is illustrated on real data dealing with problems associated with high blood pressure prevention. The proposed approach is implemented in the R package ClusPred available on CRAN. Supplementary materials containing the technical details and the R codes are available online.
引用
收藏
页码:477 / 485
页数:9
相关论文
共 50 条
  • [1] On Semiparametric Mode Regression Estimation
    Gannoun, Ali
    Saracco, Jerome
    Yu, Keming
    [J]. COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2010, 39 (07) : 1141 - 1157
  • [2] Estimation in semiparametric spatial regression
    Gao, Jiti
    Lu, Zudi
    Tjostheim, Dag
    [J]. ANNALS OF STATISTICS, 2006, 34 (03): : 1395 - 1435
  • [3] Semiparametric estimation of outbreak regression
    Frisen, Marianne
    Andersson, Eva
    Pettersson, Kjell
    [J]. STATISTICS, 2010, 44 (02) : 107 - 117
  • [4] Simultaneous variable selection and estimation in semiparametric regression of mixed panel count data
    Ge, Lei
    Hu, Tao
    Li, Yang
    [J]. BIOMETRICS, 2024, 80 (01)
  • [5] Estimation in semiparametric time series regression
    Chen, Jia
    Gao, Jiti
    Li, Degui
    [J]. STATISTICS AND ITS INTERFACE, 2011, 4 (02) : 243 - 251
  • [6] Semiparametric estimation of count regression models
    Department of Economics, Georgia State University, University Plaza, Atlanta, GA 30303, United States
    不详
    不详
    [J]. J Econom, 1 (123-150):
  • [7] Semiparametric estimation of count regression models
    Gurmu, S
    Rilstone, P
    Stern, S
    [J]. JOURNAL OF ECONOMETRICS, 1999, 88 (01) : 123 - 150
  • [8] Ridge estimation of a semiparametric regression model
    Hu, HC
    [J]. JOURNAL OF COMPUTATIONAL AND APPLIED MATHEMATICS, 2005, 176 (01) : 215 - 222
  • [9] Semiparametric regression estimation in copula models
    Bagdonavicius, Vilijandas
    Malov, Sergey V.
    Nikulin, Mikhail S.
    [J]. COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2006, 35 (08) : 1449 - 1467
  • [10] On variance estimation in semiparametric regression models
    Cheng, FX
    [J]. COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2005, 34 (08) : 1737 - 1742