DISTRIBUTED SUFFICIENT DIMENSION REDUCTION FOR HETEROGENEOUS MASSIVE DATA

被引:2
|
作者
Xu, Kelin [1 ]
Zhu, Liping [2 ,3 ]
Fan, Jianqing [4 ]
机构
[1] Fudan Univ, Sch Publ Hlth, Shanghai, Peoples R China
[2] Renmin Univ China, Ctr Appl Stat, Beijing, Peoples R China
[3] Renmin Univ China, Inst Stat & Big Data, Beijing, Peoples R China
[4] Princeton Univ, Dept Operat Res & Financial Engn, Princeton, NJ USA
基金
北京市自然科学基金;
关键词
Cumulative slicing estimation; distributed estimation; het-erogeneity; sliced inverse regression; sufficient dimension reduction; SLICED INVERSE REGRESSION; CONFIDENCE-INTERVALS; ASYMPTOTICS;
D O I
10.5705/ss.202021.0031
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We propose a distributed sufficient dimension reduction to process massive data characterized by high dimensionality, a huge sample size, and heterogeneity (heterogeneity, and huge sample sizes). To address the high dimensionality, we replace the high-dimensional explanatory variables with a small number of linear projections that are sufficient to explain the variabilities of the response variable. We allow for distinctive function maps for data scattered at different locations, thus addressing the problem of heterogeneity. We assume that the dimension reduction subspaces at different local nodes are identical. This allows us to aggregate the local results obtained from each local node to yield a final estimate on a central server. We explicitly examine the sliced inverse regression and cumulative slicing estimation, and investigate the nonasymptotic error bounds of the resulting dimensionality reduction. Our theoretical results are further supported by simulation studies and an application to meta-genome data from the American Gut Project.
引用
收藏
页码:2455 / 2476
页数:22
相关论文
共 50 条
  • [1] Sufficient dimension reduction for compositional data
    Tomassi, Diego
    Forzani, Liliana
    Duarte, Sabrina
    Pfeiffer, Ruth M.
    [J]. BIOSTATISTICS, 2021, 22 (04) : 687 - 705
  • [2] SUFFICIENT DIMENSION REDUCTION FOR LONGITUDINAL DATA
    Bi, Xuan
    Qu, Annie
    [J]. STATISTICA SINICA, 2015, 25 (02) : 787 - 807
  • [3] Distributed estimation in heterogeneous reduced rank regression: With application to order determination in sufficient dimension reduction
    Chen, Canyi
    Xu, Wangli
    Zhu, Liping
    [J]. JOURNAL OF MULTIVARIATE ANALYSIS, 2022, 190
  • [4] Sufficient dimension reduction in regressions across heterogeneous subpopulations
    Ni, LQ
    Cook, RD
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2006, 68 : 89 - 107
  • [5] NONLINEAR SUFFICIENT DIMENSION REDUCTION FOR FUNCTIONAL DATA
    Li, Bing
    Song, Jun
    [J]. ANNALS OF STATISTICS, 2017, 45 (03): : 1059 - 1095
  • [6] Missing data analysis with sufficient dimension reduction
    ZHENG, Siming
    WAN, Alan T. K.
    ZHOU, Yong
    [J]. CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2023, 51 (02): : 630 - 651
  • [7] Adaptive Randomized Dimension Reduction on Massive Data
    Darnell, Gregory
    Georgiev, Stoyan
    Mukherjee, Sayan
    Engelhardt, Barbara E.
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2017, 18
  • [8] Distributed quantile regression for massive heterogeneous data
    Hu, Aijun
    Jiao, Yuling
    Liu, Yanyan
    Shi, Yueyong
    Wu, Yuanshan
    [J]. NEUROCOMPUTING, 2021, 448 : 249 - 262
  • [9] Adaptive randomized dimension reduction on massive data
    [J]. 1600, Microtome Publishing (18):
  • [10] Functional Sufficient Dimension Reduction for Functional Data Classification
    Wang, Guochang
    Song, Xinyuan
    [J]. JOURNAL OF CLASSIFICATION, 2018, 35 (02) : 250 - 272