Kernel two-sample tests for manifold data

被引:0
|
作者
Cheng, Xiuyuan [1 ]
Xie, Yao [2 ]
机构
[1] Duke Univ, Dept Math, Durham, NC USA
[2] Georgia Inst Technol, Milton Stewart Sch Ind & Syst Engn, Atlanta, GA 30332 USA
关键词
Kernel methods; manifold data; Maximum Mean Discrepancy; two-sample test; GOODNESS-OF-FIT; SPECTRAL CONVERGENCE; GRAPH LAPLACIAN; PROBABILITY; STATISTICS;
D O I
10.3150/23-BEJ1685
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We present a study of a kernel-based two-sample test statistic related to the Maximum Mean Discrepancy (MMD) in the manifold data setting, assuming that high-dimensional observations are close to a low-dimensional manifold. We characterize the test level and power in relation to the kernel bandwidth, the number of samples, and the intrinsic dimensionality of the manifold. Specifically, when data densities p and q are supported on a d-dimensional sub-manifold M embedded in an m-dimensional space and are H & ouml;lder with order beta (up to 2) on M, we prove a guarantee of the test power for finite sample size n that exceeds a threshold depending on d, beta, and Delta 2 the squared L2-divergence between p and q on the manifold, and with a properly chosen kernel bandwidth gamma. For small density departures, we show that with large n they can be detected by the kernel test when Delta 2 is greater than n-2 beta/(d+4 beta) up to a certain constant and gamma scales as n-1/(d+4 beta). The analysis extends to cases where the manifold has a boundary and the data samples contain high-dimensional additive noise. Our results indicate that the kernel two-sample test has no curse-of-dimensionality when the data lie on or near a low-dimensional manifold. We validate our theory and the properties of the kernel test for manifold data through a series of numerical experiments.
引用
收藏
页码:2572 / 2597
页数:26
相关论文
共 50 条
  • [31] Two-sample tests for sparse high-dimensional binary data
    Plunkett, Amanda
    Park, Junyong
    [J]. COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2017, 46 (22) : 11181 - 11193
  • [32] Likelihood Ratio Type Two-Sample Tests for Current Status Data
    Groeneboom, Piet
    [J]. SCANDINAVIAN JOURNAL OF STATISTICS, 2012, 39 (04) : 645 - 662
  • [33] A comparative study of two-sample tests for interval-censored data
    Hu, Linhan
    Mandal, Soutrik
    Sinha, Samiran
    [J]. JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2021, 91 (18) : 3894 - 3916
  • [34] GRAPH-BASED TESTS FOR TWO-SAMPLE COMPARISONS OF CATEGORICAL DATA
    Chen, Hao
    Zhang, Nancy R.
    [J]. STATISTICA SINICA, 2013, 23 (04) : 1479 - 1503
  • [35] Interpoint distance-based two-sample tests for functional data
    Yamaguchi, Hikaru
    Murakami, Hidetoshi
    [J]. COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2024, 53 (08) : 2771 - 2791
  • [36] More powerful logrank permutation tests for two-sample survival data
    Ditzhaus, Marc
    Friedrich, Sarah
    [J]. JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2020, 90 (12) : 2209 - 2227
  • [37] Kernel two-sample tests in high dimensions: interplay between moment discrepancy and dimension-and-sample orders
    Yan, Jian
    Zhang, Xianyang
    [J]. BIOMETRIKA, 2023, 110 (02) : 411 - 430
  • [38] Modeling and Analysis of Students' Performance Trajectories using Diffusion Maps and Kernel Two-Sample Tests
    Rabin, N.
    Golan, M.
    Singer, G.
    Kleper, D.
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2019, 85 : 492 - 503
  • [39] Two-Sample Test with Kernel Projected Wasserstein Distance
    Wang, Jie
    Gao, Rui
    Xie, Yao
    [J]. INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
  • [40] A permutation-free kernel two-sample test
    Shekhar, Shubhanshu
    Kim, Ilmun
    Ramdas, Aaditya
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,