Deterministic subsampling for logistic regression with massive data

被引:1
|
作者
Song, Yan [1 ]
Dai, Wenlin [1 ]
机构
[1] Renmin Univ China, Inst Stat & Big Data, Ctr Appl Stat, Beijing, Peoples R China
关键词
Leverage score; Linear classifier; Non-asymptotic property; Observed information; STATISTICAL PERSPECTIVE; LEVERAGE;
D O I
10.1007/s00180-022-01319-z
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
For logistic regression with massive data, subsampling is an effective way to alleviate the computational challenge. In contrast to most existing methods in the literature that select subsamples randomly, we propose to obtain subsamples in a deterministic way. To be more specific, we measure with leverage scores the influence of each sample to model fitting and select the ones with the highest scores deterministically. We propose a faster alternative method by mimicking the leverage scores with a simple and intuitive form. Our methods pick subsamples catering for constructing a linear classification boundary and hence are more efficient when the subsample size is small. We derive non-asymptotic properties of the two methods regarding the observed information, prediction, and parameter estimation accuracy. Extensive simulation studies and two real applications validate the theoretical results and demonstrate the superiority of our methods.
引用
收藏
页码:709 / 732
页数:24
相关论文
共 50 条
  • [1] Deterministic subsampling for logistic regression with massive data
    Yan Song
    Wenlin Dai
    Computational Statistics, 2024, 39 : 709 - 732
  • [2] Robust and efficient subsampling algorithms for massive data logistic regression
    Jin, Jun
    Liu, Shuangzhe
    Ma, Tiefeng
    JOURNAL OF APPLIED STATISTICS, 2024, 51 (08) : 1427 - 1445
  • [3] Optimal subsampling for modal regression in massive data
    Chao, Yue
    Huang, Lei
    Ma, Xuejun
    Sun, Jiajun
    METRIKA, 2024, 87 (04) : 379 - 409
  • [4] Optimal subsampling for multiplicative regression with massive data
    Wang, Tianzhen
    Zhang, Haixiang
    STATISTICA NEERLANDICA, 2022, 76 (04) : 418 - 449
  • [5] Optimal subsampling for modal regression in massive data
    Yue Chao
    Lei Huang
    Xuejun Ma
    Jiajun Sun
    Metrika, 2024, 87 : 379 - 409
  • [6] Random perturbation subsampling for rank regression with massive data
    He, Sijin
    Xia, Xiaochao
    STATISTICS AND COMPUTING, 2025, 35 (01)
  • [7] Distributed optimal subsampling for quantile regression with massive data
    Chao, Yue
    Ma, Xuejun
    Zhu, Boya
    JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2024, 233
  • [8] Optimal subsampling algorithms for composite quantile regression in massive data
    Jin, Jun
    Liu, Shuangzhe
    Ma, Tiefeng
    STATISTICS, 2023, 57 (04) : 811 - 843
  • [9] Optimal subsampling for composite quantile regression model in massive data
    Shao, Yujing
    Wang, Lei
    STATISTICAL PAPERS, 2022, 63 (04) : 1139 - 1161
  • [10] Optimal subsampling for composite quantile regression model in massive data
    Yujing Shao
    Lei Wang
    Statistical Papers, 2022, 63 : 1139 - 1161