Kernel two-sample tests for manifold data

被引：0

作者：

Cheng, Xiuyuan ^{[1
]}

Xie, Yao ^{[2
]}

机构：

[1] Duke Univ, Dept Math, Durham, NC USA

[2] Georgia Inst Technol, Milton Stewart Sch Ind & Syst Engn, Atlanta, GA 30332 USA

来源：

BERNOULLI | 2024年 / 30卷 / 04期

关键词：

Kernel methods; manifold data; Maximum Mean Discrepancy; two-sample test; GOODNESS-OF-FIT; SPECTRAL CONVERGENCE; GRAPH LAPLACIAN; PROBABILITY; STATISTICS;

D O I：

10.3150/23-BEJ1685

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

We present a study of a kernel-based two-sample test statistic related to the Maximum Mean Discrepancy (MMD) in the manifold data setting, assuming that high-dimensional observations are close to a low-dimensional manifold. We characterize the test level and power in relation to the kernel bandwidth, the number of samples, and the intrinsic dimensionality of the manifold. Specifically, when data densities p and q are supported on a d-dimensional sub-manifold M embedded in an m-dimensional space and are H & ouml;lder with order beta (up to 2) on M, we prove a guarantee of the test power for finite sample size n that exceeds a threshold depending on d, beta, and Delta 2 the squared L2-divergence between p and q on the manifold, and with a properly chosen kernel bandwidth gamma. For small density departures, we show that with large n they can be detected by the kernel test when Delta 2 is greater than n-2 beta/(d+4 beta) up to a certain constant and gamma scales as n-1/(d+4 beta). The analysis extends to cases where the manifold has a boundary and the data samples contain high-dimensional additive noise. Our results indicate that the kernel two-sample test has no curse-of-dimensionality when the data lie on or near a low-dimensional manifold. We validate our theory and the properties of the kernel test for manifold data through a series of numerical experiments.

引用

页码：2572 / 2597

页数：26

共 50 条

[31] Two-sample tests for sparse high-dimensional binary data
Plunkett, Amanda
Park, Junyong
[J]. COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2017, 46 (22) : 11181 - 11193
[32] Likelihood Ratio Type Two-Sample Tests for Current Status Data
Groeneboom, Piet
[J]. SCANDINAVIAN JOURNAL OF STATISTICS, 2012, 39 (04) : 645 - 662
[33] A comparative study of two-sample tests for interval-censored data
Hu, Linhan
Mandal, Soutrik
Sinha, Samiran
[J]. JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2021, 91 (18) : 3894 - 3916
[34] GRAPH-BASED TESTS FOR TWO-SAMPLE COMPARISONS OF CATEGORICAL DATA
Chen, Hao
Zhang, Nancy R.
[J]. STATISTICA SINICA, 2013, 23 (04) : 1479 - 1503
[35] Interpoint distance-based two-sample tests for functional data
Yamaguchi, Hikaru
Murakami, Hidetoshi
[J]. COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2024, 53 (08) : 2771 - 2791
[36] More powerful logrank permutation tests for two-sample survival data
Ditzhaus, Marc
Friedrich, Sarah
[J]. JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2020, 90 (12) : 2209 - 2227
[37] Kernel two-sample tests in high dimensions: interplay between moment discrepancy and dimension-and-sample orders
Yan, Jian
Zhang, Xianyang
[J]. BIOMETRIKA, 2023, 110 (02) : 411 - 430
[38] Modeling and Analysis of Students' Performance Trajectories using Diffusion Maps and Kernel Two-Sample Tests
Rabin, N.
Golan, M.
Singer, G.
Kleper, D.
[J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2019, 85 : 492 - 503
[39] Two-Sample Test with Kernel Projected Wasserstein Distance
Wang, Jie
Gao, Rui
Xie, Yao
[J]. INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
[40] A permutation-free kernel two-sample test
Shekhar, Shubhanshu
Kim, Ilmun
Ramdas, Aaditya
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,

← 1 2 3 4 5 →