Two-sample test for equal distributions in separate metric space: New maximum mean discrepancy based approaches

被引：4

作者：

Zhang, Jin-Ting ^{[1
]}

Smaga, Lukasz ^{[2
]}

机构：

[1] Natl Univ Singapore, Dept Stat & Appl Probabil, 3 Sci Dr 2, Singapore 117546, Singapore

[2] Adam Mickiewicz Univ, Fac Math & Comp Sci, Uniwersytetu Poznanskiego 4, PL-61614 Poznan, Poland

来源：

ELECTRONIC JOURNAL OF STATISTICS | 2022年 / 16卷 / 02期

关键词：

Equality of distribution; hypothesis testing; maximum mean discrepancy; three-cumulant matched chi-square approxi-mation; two-sample problem;

D O I：

10.1214/22-EJS2033

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

This article develops statistical methods for testing the equality of two distributions based on two independent samples generated in some separable metric space. Such methods are broadly applicable in identifying similarity or distinction of two complicated data sets (e.g., high-dimensional data or functional data) collected in a wide range of research or industry areas, including biology, bioinformatics, medicine, material science, among others. Recently a so-called maximum mean discrepancy (MMD) based ap-proach for the above two-sample problem has been proposed, resulting in several interesting tests. However, the main theoretical , numerical re-sults of these MMD based tests depend on the very restricted assumption that the two samples have equal sample sizes. In addition, these tests are generally implemented via permutation when the equal sample size assump-tion is violated. In real data analysis, this equal sample size assumption is hardly satisfied , dropping away some of the observations often means the loss of priceless information. It is also of interest to know if an MMD-based test can be conducted generally without using permutation. In this paper, we further study this MMD based approach with the equal sample size assumption removed. We establish the asymptotic null and alternative distributions of the MMD test statistic and its root -n consistency. We pro-pose methods for approximating the null distribution, resulting in easy and quick implementation. Numerical experiments based on artificial data and two real data sets from two different areas of applications demonstrate that in terms of control of the type I error level and power, the resulting tests perform better or no worse than several existing competitors.

引用

页码：4090 / 4132

页数：43

共 15 条

[1] A new maximum mean discrepancy based two-sample test for equal distributions in separable metric spaces
Zhou, Bu
Ong, Zhi Peng
Zhang, Jin-Ting
[J]. STATISTICS AND COMPUTING, 2024, 34 (05)
[2] Two-sample test based on maximum variance discrepancy
Makigusa, N.
[J]. COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2024, 53 (15) : 5421 - 5438
[3] New two-sample test utilizing interpoint distance discrepancy
Xu, Dong
[J]. STAT, 2024, 13 (03):
[4] Two-sample high dimensional mean test based on prepivots
Ghosh, Santu
Ayyala, Deepak Nag
Hellebuyck, Rafael
[J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2021, 163
[5] Testing equality of several distributions in separable metric spaces: A maximum mean discrepancy based approach
Zhang, Jin-Ting
Guo, Jia
Zhou, Bu
[J]. JOURNAL OF ECONOMETRICS, 2024, 239 (02)
[6] A hyperbolic divergence based nonparametric test for two-sample multivariate distributions
Wang, Roulin
Fan, Wei
Wang, Xueqin
[J]. CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2023, 51 (04): : 1034 - 1054
[7] A k-Sample Test for Functional Data Based on Generalized Maximum Mean Discrepancy
Armando Sosthène Kali Balogoun
Guy Martial Nkiet
Carlos Ogouyandjou
[J]. Lithuanian Mathematical Journal, 2022, 62 : 289 - 303
[8] A k-Sample Test for Functional Data Based on Generalized Maximum Mean Discrepancy
Balogoun, Armando Sosthene Kali
Nkiet, Guy Martial
Ogouyandjou, Carlos
[J]. LITHUANIAN MATHEMATICAL JOURNAL, 2022, 62 (03) : 289 - 303
[9] A new test for two-sample location problem based on empirical distribution function
Mathur, S. K.
Sakate, D. M.
[J]. COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2017, 46 (24) : 12345 - 12355
[10] A New Graph-Based Two-Sample Test for Multivariate and Object Data
Chen, Hao
Friedman, Jerome H.
[J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2017, 112 (517) : 397 - 409

← 1 2 →