Optimal sampling allocation for outcome-dependent designs in cluster-correlated data settings

被引:0
|
作者
Rivera-Rodriguez, Claudia [1 ]
Haneuse, Sebastien [2 ]
Sauer, Sara [3 ]
机构
[1] Univ Auckland, Dept Stat, Level 3,Bldg 303-329,38 Princes St, Auckland, New Zealand
[2] Harvard TH Chan Sch Publ Hlth, Dept Biostat, Boston, MA USA
[3] Harvard Med Sch, Dept Global Hlth & Social Med, Boston, MA 02115 USA
关键词
Calibration; generalized estimating equations; optimal allocation; outcome-dependent sampling; two-phase design; CALIBRATION ESTIMATORS; LOGISTIC-REGRESSION; 2-PHASE; PARAMETERS; MODELS;
D O I
10.1177/09622802221122423
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
In clinical and public health studies, it is often the case that some variables relevant to the analysis are too difficult or costly to measure for all individuals in the population of interest. Rather, a subsample of these individuals must be identified for additional data collection. A sampling scheme that incorporates readily-available information for the entire target population at the design stage can increase the statistical efficiency of the intended analysis. While there is no universally optimal sampling design, under certain principles and restrictions, a well-designed and efficient sampling strategy can be implemented. In two-phase designs, efficiency can be gained by stratifying on the outcome and/or auxiliary information that is known at phase I. Additional gains in efficiency can be obtained by determining the optimal allocation of the sample sizes across the strata, which depends on the quantity that is being estimated. In this paper, the inference is concerned with one or multiple regression parameter(s) where the study units are naturally clustered and, thus, exhibit correlation in outcomes. We propose several allocation strategies within the framework of two-phase designs for the estimation of the regression parameter(s) obtained from weighted generalized estimating equations. The proposed methods extend existing theory to address the objective of the estimating regression parameters in cluster-correlated data settings by minimizing the asymptotic variance of the estimator subject to a fixed sample size. Through a comprehensive simulation study, we show that the proposed allocation schemes have the potential to yield substantial efficiency gains over alternative strategies.
引用
收藏
页码:2400 / 2414
页数:15
相关论文
共 50 条
  • [1] Outcome-dependent sampling in cluster-correlated data settings with application to hospital profiling
    McGee, Glen
    Schildcrout, Jonathan
    Normand, Sharon-Lise
    Haneuse, Sebastien
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES A-STATISTICS IN SOCIETY, 2019, : 379 - 402
  • [2] Outcome-dependent sampling in cluster-correlated data settings with application to hospital profiling
    McGee, Glen
    Schildcrout, Jonathan
    Normand, Sharon-Lise
    Haneuse, Sebastien
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES A-STATISTICS IN SOCIETY, 2020, 183 (01) : 379 - 402
  • [3] Optimal allocation in stratified cluster-based outcome-dependent sampling designs
    Sauer, Sara
    Hedt-Gauthier, Bethany
    Haneuse, Sebastien
    [J]. STATISTICS IN MEDICINE, 2021, 40 (18) : 4090 - 4107
  • [4] Practical strategies for operationalizing optimal allocation in stratified cluster-based outcome-dependent sampling designs
    Sauer, Sara
    Hedt-Gauthier, Bethany
    Haneuse, Sebastien
    [J]. STATISTICS IN MEDICINE, 2023, 42 (07) : 917 - 935
  • [5] On the analysis of two-phase designs in cluster-correlated data settings
    Rivera-Rodriguez, C.
    Spiegelman, D.
    Haneuse, S.
    [J]. STATISTICS IN MEDICINE, 2019, 38 (23) : 4611 - 4624
  • [6] Likelihood-based analysis of outcome-dependent sampling designs with longitudinal data
    Zelnick, Leila R.
    Schildcrout, Jonathan S.
    Heagerty, Patrick J.
    [J]. STATISTICS IN MEDICINE, 2018, 37 (13) : 2120 - 2133
  • [7] On the Analysis of Case-Control Studies in Cluster-correlated Data Settings
    Haneuse, Sebastien
    Rivera-Rodriguez, Claudia
    [J]. EPIDEMIOLOGY, 2018, 29 (01) : 50 - 57
  • [8] The effect of misspecification of random effects distributions in clustered data settings with outcome-dependent sampling
    Neuhaus, John M.
    Mcculloch, Charles E.
    [J]. CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2011, 39 (03): : 488 - 497
  • [9] On outcome-dependent sampling designs for longitudinal binary response data with time-varying covariates
    Schildcrout, Jonathan S.
    Heagerty, Patrick J.
    [J]. BIOSTATISTICS, 2008, 9 (04) : 735 - 749
  • [10] Causal inference in outcome-dependent two-phase sampling designs
    Wang, Weiwei
    Scharfstein, Daniel
    Tan, Zhiqiang
    MacKenzie, Ellen J.
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2009, 71 : 947 - 969