A distance metric-based space-filling subsampling method for nonparametric models

被引:0
|
作者
Diao, Huaimi [1 ]
Wang, Dianpeng [1 ]
He, Xu [2 ]
机构
[1] Beijing Inst Technol, Sch Math & Stat, Beijing, Peoples R China
[2] Chinese Acad Sci, Acad Math & Syst Sci, MADIS, Beijing, Peoples R China
来源
ELECTRONIC JOURNAL OF STATISTICS | 2024年 / 18卷 / 02期
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Big data; nonparametric model; space-filling design; tall data;
D O I
10.1214/24-EJS2251
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Taking subset samples from the original data set is an efficient and popular strategy to handle massive data that is too large to be directly modeled. To optimize inference and prediction accuracy, it is crucial to employ a subsampling scheme to collect observations intelligently. In this paper, we propose a space-filling subsampling method that uses distance metric-based strata to select subsamples from high-volume data sets. To minimize the maximal distance from pairs of samples that locate in the same stratum, Voronoi cells of thinnest covering lattices are used to partition the input space. In addition, subsamples that are space-filling according to the response are collected from each stratum. With the help of an algorithm to quickly identify the cell an observation locates in, the computational cost of our subsampling method is proportional to the number of observations and irrelevant to the number of cells, which makes our method applicable to extremely large data sets. Results from simulated studies and real data analysis show that the new method is remarkably better than existing approaches when used in conjunction with Gaussian process models.
引用
下载
收藏
页码:3247 / 3273
页数:27
相关论文
共 50 条
  • [21] EASY METHOD FOR RETRIEVING ATOMIC COORDINATES FROM CPK SPACE-FILLING MOLECULAR MODELS
    DEHAEN, C
    SWANSON, E
    TELLER, DC
    BIOPOLYMERS, 1976, 15 (09) : 1825 - 1833
  • [22] Nonparametric estimation of nonlinear dynamics by metric-based local linear approximation
    Isao Shoji
    Statistical Methods & Applications, 2013, 22 : 341 - 353
  • [23] Metric-Based Evaluation of Multiagent Systems Models
    Damasceno, Lidiane
    Werneck, Vera Maria B.
    Schots, Marcelo
    2018 IEEE/ACM 10TH INTERNATIONAL WORKSHOP ON MODELLING IN SOFTWARE ENGINEERING (MISE), 2018, : 67 - 74
  • [24] A DIMENSION-OBLIVIOUS DOMAIN DECOMPOSITION METHOD BASED ON SPACE-FILLING CURVES
    Griebel, Michael
    Schweitzer, Marc A.
    Troska, Lukas
    SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2023, 45 (02): : A369 - A396
  • [25] Nonparametric estimation of nonlinear dynamics by metric-based local linear approximation
    Shoji, Isao
    STATISTICAL METHODS AND APPLICATIONS, 2013, 22 (03): : 341 - 353
  • [26] Research on a toolpath generation method of NC milling based on space-filling curve
    隋秀凛
    Chen Xiaoqi
    Ge Jianghua
    Jiao Yan
    High Technology Letters, 2017, 23 (04) : 418 - 425
  • [27] Research on a toolpath generation method of NC milling based on space-filling curve
    Sui X.
    Chen X.
    Ge J.
    Jiao Y.
    Sui, Xiulin (xiulinsui@163.com), 1600, Inst. of Scientific and Technical Information of China (23): : 418 - 425
  • [28] EARLY DEVELOPMENT AND USE OF THE SPACE-FILLING MOLECULAR-MODELS
    RAMSAY, OB
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 1979, (APR): : 58 - 58
  • [29] EXACT VISIBILITY CALCULATION FOR SPACE-FILLING MOLECULAR-MODELS
    CENSE, JM
    JOURNAL OF MOLECULAR GRAPHICS, 1991, 9 (03): : 191 - 193
  • [30] COMPUTER-DRAWN SPACE-FILLING MOLECULAR MODELS OF PROTEINS
    WARME, PK
    FEDERATION PROCEEDINGS, 1976, 35 (07) : 1388 - 1388