Superpopulation model inference for non probability samples under informative sampling with high-dimensional data

被引:0
|
作者
Liu, Zhan [1 ]
Wang, Dianni [1 ]
Pan, Yingli [1 ]
机构
[1] Hubei Univ, Sch Math & Stat, Hubei Key Lab Appl Math, Wuhan 430062, Peoples R China
关键词
Non probability samples; superpopulation model; informative sampling; high-dimensional data; variable selection;
D O I
10.1080/03610926.2024.2335543
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Non probability samples have been widely used in various fields. However, non probability samples suffer from selection biases due to the unknown selection probabilities. Superpopulation model inference methods have been discussed to solve this problem, but these approaches require the non informative sampling assumption. When the sampling mechanism is informative sampling, that is, selection probabilities are related to the outcome variable, the previous inference methods may be invalid. Moreover, we may encounter a large number of covariates in practice, which poses a new challenge for inference from non probability samples under informative sampling. In this article, the superpopulation model approaches under informative sampling with high-dimensional data are developed to perform valid inferences from non probability samples. Specifically, a semiparametric exponential tilting model is established to estimate selection probabilities, and the sample distribution is derived for estimating the superpopulation model parameters. Moreover, SCAD, adaptive LASSO, and Model-X knockoffs are employed to select variables, and estimate parameters in superpopulation modeling. Asymptotic properties of the proposed estimators are established. Results from simulation studies are presented to compare the performance of the proposed estimators with the naive estimator, which ignores informative sampling. The proposed methods are further applied to the National Health and Nutrition Examination Survey data.
引用
收藏
页码:1370 / 1390
页数:21
相关论文
共 50 条
  • [41] Discriminant analysis of high-dimensional data over limited samples
    Serdobolskii, V. I.
    DOKLADY MATHEMATICS, 2010, 81 (01) : 75 - 77
  • [42] Bayesian inference for high-dimensional linear regression under mnet priors
    Tan, Aixin
    Huang, Jian
    CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2016, 44 (02): : 180 - 197
  • [43] Learning to classify from impure samples with high-dimensional data
    Komiske, Patrick T.
    Metodiev, Eric M.
    Nachman, Benjamin
    Schwartz, Matthew D.
    PHYSICAL REVIEW D, 2018, 98 (01)
  • [44] Model averaging with high-dimensional dependent data
    Zhao, Shangwei
    Zhou, Jianhong
    Li, Hongjun
    ECONOMICS LETTERS, 2016, 148 : 68 - 71
  • [45] Proximal nested sampling for high-dimensional Bayesian model selection
    Xiaohao Cai
    Jason D. McEwen
    Marcelo Pereyra
    Statistics and Computing, 2022, 32
  • [46] Progressive Transmission of High-Dimensional Data Features for Inference at the Network Edge
    Lan, Qiao
    Zeng, Qunsong
    Popovski, Petar
    Gunduz, Deniz
    Huang, Kaibin
    2022 IEEE 23RD INTERNATIONAL WORKSHOP ON SIGNAL PROCESSING ADVANCES IN WIRELESS COMMUNICATION (SPAWC), 2022,
  • [47] Inference and Verification of Probabilistic Graphical Models from High-Dimensional Data
    Ma, Yinjiao
    Damazyn, Kevin
    Klinger, Jakob
    Gong, Haijun
    DATA INTEGRATION IN THE LIFE SCIENCES, DILS 2015, 2015, 9162 : 223 - 239
  • [48] Proximal nested sampling for high-dimensional Bayesian model selection
    Cai, Xiaohao
    McEwen, Jason D.
    Pereyra, Marcelo
    STATISTICS AND COMPUTING, 2022, 32 (05)
  • [49] STATISTICAL INFERENCE FOR HIGH-DIMENSIONAL LINEAR REGRESSION WITH BLOCKWISE MISSING DATA
    Xue, Fei
    Ma, Rong
    Li, Hongzhe
    STATISTICA SINICA, 2025, 35 (01) : 431 - 456
  • [50] HESSIAN-BASED SAMPLING FOR HIGH-DIMENSIONAL MODEL REDUCTION
    Chen, Peng
    Ghattas, Omar
    INTERNATIONAL JOURNAL FOR UNCERTAINTY QUANTIFICATION, 2019, 9 (02) : 103 - 121