Regression Models Involving Nonlinear Effects With Missing Data: A Sequential Modeling Approach Using Bayesian Estimation

被引:25
|
作者
Luedtke, Oliver [1 ,2 ]
Robitzsch, Alexander [1 ,2 ]
West, Stephen G. [3 ]
机构
[1] Leibniz Inst Sci & Math Educ, Dept Educ Measurement, Olshausenstr 62, D-24118 Kiel, Germany
[2] Ctr Int Student Assessment, Munich, Germany
[3] Arizona State Univ, Dept Psychol, Tempe, AZ 85287 USA
关键词
multiple regression; missing data; interaction effects; multiple imputation; GENERALIZED LINEAR-MODELS; MAXIMUM-LIKELIHOOD-ESTIMATION; MULTIPLE-IMPUTATION; CHAINED EQUATIONS; MULTILEVEL MODELS; VALUES; DISTRIBUTIONS; INFERENCE; KURTOSIS; SKEWNESS;
D O I
10.1037/met0000233
中图分类号
B84 [心理学];
学科分类号
04 ; 0402 ;
摘要
When estimating multiple regression models with incomplete predictor variables, it is necessary to specify a joint distribution for the predictor variables. A convenient assumption is that this distribution is a joint normal distribution, the default in many statistical software packages. This distribution will in general be misspecified if the predictors with missing data have nonlinear effects (e.g., x(2)) or are included in interaction terms (e.g., x . z). In the present article, we discuss a sequential modeling approach that can be applied to decompose the joint distribution of the variables into 2 parts: (a) a part that is due to the model of interest and (b) a part that is due to the model for the incomplete predictors. We demonstrate how the sequential modeling approach can be used to implement a multiple imputation strategy based on Bayesian estimation techniques that can accommodate rather complex substantive regression models with nonlinear effects and also allows a flexible treatment of auxiliary variables. In 4 simulation studies, we showed that the sequential modeling approach can be applied to estimate nonlinear effects in regression models with missing values on continuous, categorical, or skewed predictor variables under a broad range of conditions and investigated the robustness of the proposed approach against distributional misspecifications. We developed the R package mdmb, which facilitates a user-friendly application of the sequential modeling approach, and we present a real-data example that illustrates the flexibility of the software. Translational Abstract Regression models testing whether two predictor variables interact to produce an effect on the outcome variable are commonly used in psychology. Often a portion of the participants do not fully complete their responses so their data are missing on one or both of the predictor variables. Although more modern methods of addressing missing data typically lead to more accurate results, the performance of these methods may be greatly diminished when regression models contain interactions or other nonlinear effects. We describe a new sequential modeling approach using multiple imputation that separates the problem into two parts: (a) the substantive regression model of interest and (b) the imputation model; this approach theoretically identifies when the two parts are compatible. When the two parts are compatible, Bayesian estimation can be used to produce accurate results. We show the improved performance of this sequential modeling approach relative to other forms of multiple imputation in four simulation studies under a broad range of conditions. The simulation studies considered most of the types of predictor variables commonly considered in research: normally distributed continuous, skewed continuous, binary, and latent. We developed the R package mdmb, which facilitates a user-friendly application of the sequential modeling approach, and we present a real-data example that illustrates the flexibility of the software. Annotated R computer script for the main analyses is presented in the online supplemental material.
引用
收藏
页码:157 / 181
页数:25
相关论文
共 50 条
  • [1] Analysis of Interactions and Nonlinear Effects with Missing Data: A Factored Regression Modeling Approach Using Maximum Likelihood Estimation
    Luedtke, Oliver
    Robitzsch, Alexander
    West, Stephen G.
    [J]. MULTIVARIATE BEHAVIORAL RESEARCH, 2020, 55 (03) : 361 - 381
  • [2] A Bayesian Approach Towards Missing Covariate Data in Multilevel Latent Regression Models
    Assmann, Christian
    Gaasch, Jean-Christoph
    Stingl, Doris
    [J]. PSYCHOMETRIKA, 2023, 88 (04) : 1495 - 1528
  • [3] A Bayesian Approach Towards Missing Covariate Data in Multilevel Latent Regression Models
    Christian Aßmann
    Jean-Christoph Gaasch
    Doris Stingl
    [J]. Psychometrika, 2023, 88 : 1495 - 1528
  • [4] BAYESIAN APPROACH TO SEQUENTIAL DESIGN OF EXPERIMENTS FOR REGRESSION MODELS
    CHEW, MC
    MONACO, SJ
    [J]. BIOMETRICS, 1974, 30 (02) : 377 - 377
  • [5] NONLINEAR REGRESSION USING SMOOTH BAYESIAN ESTIMATION
    Halimi, Abderrahim
    Mailhes, Corinne
    Tourneret, Jean-Yves
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 2634 - 2638
  • [6] Sequential Regression with Missing Data Using LSTM Networks
    sahin, Safa Onur
    [J]. 2019 27TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2019,
  • [7] A Bayesian approach for nonlinear regression models with continuous errors
    de la Cruz-Mesia, R
    Marshall, G
    [J]. COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2003, 32 (08) : 1631 - 1646
  • [8] Bayesian Nonparametric Regression Modeling of Panel Data for Sequential Classification
    Xiong, Sihan
    Fu, Yiwei
    Ray, Asok
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (09) : 4128 - 4139
  • [9] Bayesian regression models for the estimation of net cost of disease using aggregate data
    Mitsakakis, Nicholas
    Tomlinson, George
    [J]. STATISTICAL METHODS IN MEDICAL RESEARCH, 2017, 26 (03) : 1110 - 1129
  • [10] Variational Bayesian Inference for Quantile Regression Models with Nonignorable Missing Data
    Li, Xiaoning
    Tuerde, Mulati
    Hu, Xijian
    [J]. MATHEMATICS, 2023, 11 (18)