A scalable nonparametric specification testing for massive data

被引:4
|
作者
Zhao, Yanyan [1 ,2 ]
Zou, Changliang [1 ,2 ]
Wang, Zhaojun [1 ,2 ]
机构
[1] Nankai Univ, Inst Stat, Tianjin, Peoples R China
[2] Nankai Univ, LPMC, Tianjin, Peoples R China
关键词
Adaptive test; Asymptotic normality; Lack-of-fit test; Rate-optimal; Sample-splitting method; OF-FIT TESTS; REGRESSION-CURVES; FUNCTIONAL FORM; CONSISTENT TEST; MODEL; SELECTION; EQUALITY; RATES;
D O I
10.1016/j.jspi.2018.09.012
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Lack-of-fit checking for parametric models is essential in reducing misspecification. However, for massive data sets which are increasingly prevalent, classical tests become prohibitively costly in computation and their feasibility is questionable even with modern parallel computing platforms. Building on the divide and conquer strategy, we propose a new nonparametric testing method, that is fast to compute and easy to implement with only one tuning parameter determined by a given time budget. Under mild conditions, we show that the proposed test statistic is asymptotically equivalent to that based on the whole data. Benefiting from using the sample-splitting idea for choosing the smoothing parameter, the proposed test is able to retain the type-I error rate pretty well with asymptotic distributions and achieves adaptive rate-optimal detection properties. Its advantage relative to existing methods is also demonstrated in numerical simulations and a data illustration. (C) 2018 Elsevier B.V. All rights reserved.
引用
收藏
页码:161 / 175
页数:15
相关论文
共 50 条
  • [41] SPECIFICATION AND TESTING OF ABSTRACT-DATA-TYPES
    JALOTE, P
    COMPUTER LANGUAGES, 1992, 17 (01): : 75 - 82
  • [42] Specification testing for regression models with dependent data
    Hidalgo, J.
    JOURNAL OF ECONOMETRICS, 2008, 143 (01) : 143 - 165
  • [43] A scalable Bayesian nonparametric model for large spatio-temporal data
    Barzegar, Zahra
    Rivaz, Firoozeh
    COMPUTATIONAL STATISTICS, 2020, 35 (01) : 153 - 173
  • [44] A scalable Bayesian nonparametric model for large spatio-temporal data
    Zahra Barzegar
    Firoozeh Rivaz
    Computational Statistics, 2020, 35 : 153 - 173
  • [45] Bayesian nonparametric hypothesis testing for longitudinal data analysis
    Pereira L.A.
    Gutiérrez L.
    Taylor-Rodríguez D.
    Mena R.H.
    Computational Statistics and Data Analysis, 2023, 179
  • [46] Scalable Nonparametric Tensor Analysis
    Zhe, Shandian
    THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 5058 - 5059
  • [47] Scalable Visualization and Interactive Analysis using Massive Data Streams
    Pascucci, Valerio
    Bremer, Peer-Timo
    Gyulassy, Attila
    Scorzelli, Giorgio
    Christensen, Cameron
    Summa, Brian
    Kumar, Sidharth
    CLOUD COMPUTING AND BIG DATA, 2013, 23 : 212 - 230
  • [48] A scalable heterogeneous solution for massive data collection and database loading
    Shani, Uri
    Sela, Aviad
    Akilov, Alex
    Skarbovski, Inna
    Berk, David
    BUSINESS INTELLIGENCE FOR THE REAL-TIME ENTERPRISES, 2007, 4365 : 50 - +
  • [49] A scalable parallel subspace clustering algorithm for massive data sets
    Nagesh, HS
    Goil, S
    Choudhary, A
    2000 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, PROCEEDINGS, 2000, : 477 - 484
  • [50] A Demonstration of Summit: a Scalable Data Management Framework for Massive Trajectory
    Alarabi, Louai
    Mokbel, Mohamed F.
    2020 21ST IEEE INTERNATIONAL CONFERENCE ON MOBILE DATA MANAGEMENT (MDM 2020), 2020, : 226 - 227