Global and local two-sample tests via regression

被引:14
|
作者
Kim, Ilmun [1 ]
Lee, Ann B. [1 ]
Lei, Jing [1 ]
机构
[1] Carnegie Mellon Univ, Dept Stat & Data Sci, 5000 Forbes Ave, Pittsburgh, PA 15213 USA
来源
ELECTRONIC JOURNAL OF STATISTICS | 2019年 / 13卷 / 02期
关键词
Galaxy morphology; intrinsic dimension; kernel regression; nearest neighbor regression; permutation test; random forests; RANDOM FORESTS; MINIMAX RATES; DENSITY; MODELS; CLASSIFICATION; STATISTICS; LIKELIHOOD;
D O I
10.1214/19-EJS1648
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Two-sample testing is a fundamental problem in statistics. Despite its long history, there has been renewed interest in this problem with the advent of high-dimensional and complex data. Specifically, in the machine learning literature, there have been recent methodological developments such as classification accuracy tests. The goal of this work is to present a regression approach to comparing multivariate distributions of complex data. Depending on the chosen regression model, our framework can efficiently handle different types of variables and various structures in the data, with competitive power under many practical scenarios. Whereas previous work has been largely limited to global tests which conceal much of the local information, our approach naturally leads to a local two-sample testing framework in which we identify local differences between multivariate distributions with statistical confidence. We demonstrate the efficacy of our approach both theoretically and empirically, under some well-known parametric and nonparametric regression methods. Our proposed methods are applied to simulated data as well as a challenging astronomy data set to assess their practical usefulness.
引用
收藏
页码:5253 / 5305
页数:53
相关论文
共 50 条
  • [1] Local significant differences from nonparametric two-sample tests
    Tarn Duong
    [J]. JOURNAL OF NONPARAMETRIC STATISTICS, 2013, 25 (03) : 635 - 645
  • [2] Characterising transitive two-sample tests
    Lumley, Thomas
    Gillen, Daniel L.
    [J]. STATISTICS & PROBABILITY LETTERS, 2016, 109 : 118 - 123
  • [3] Scalable kernel two-sample tests via empirical likelihood and jackknife
    Wen, Qian
    Yuan, Mingao
    [J]. COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2023, 52 (12) : 5975 - 5990
  • [4] Generalized kernel two-sample tests
    Song, Hoseung
    Chen, Hao
    [J]. BIOMETRIKA, 2024, 111 (03) : 755 - 770
  • [5] TWO-SAMPLE TESTS FOR HIGH-DIMENSIONAL LINEAR REGRESSION WITH AN APPLICATION TO DETECTING INTERACTIONS
    Xia, Yin
    Cai, Tianxi
    Cai, T. Tony
    [J]. STATISTICA SINICA, 2018, 28 (01) : 63 - 92
  • [6] Empirical likelihood tests for two-sample problems via nonparametric density estimation
    Cao, R
    Van Keilegom, I
    [J]. CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2006, 34 (01): : 61 - 77
  • [7] Two-sample tests for multivariate functional data
    Jiang, Qing
    Meintanis, Simos G.
    Zhu, Lixing
    [J]. FUNCTIONAL STATISTICS AND RELATED FIELDS, 2017, : 145 - 154
  • [8] One- and two-sample t tests
    Hess, Aaron S.
    Hess, John R.
    [J]. TRANSFUSION, 2017, 57 (10) : 2319 - 2320
  • [9] Saddlepoint approximations to the two-sample permutation tests
    Jing Bingyi
    [J]. Acta Mathematicae Applicatae Sinica, 1998, 14 (2) : 197 - 201
  • [10] Two-Sample Tests for Comparing Measurement Systems
    Majeske, Karl D.
    [J]. QUALITY ENGINEERING, 2012, 24 (04) : 501 - 513