Two-sample test for equal distributions in separate metric space: New maximum mean discrepancy based approaches

被引:4
|
作者
Zhang, Jin-Ting [1 ]
Smaga, Lukasz [2 ]
机构
[1] Natl Univ Singapore, Dept Stat & Appl Probabil, 3 Sci Dr 2, Singapore 117546, Singapore
[2] Adam Mickiewicz Univ, Fac Math & Comp Sci, Uniwersytetu Poznanskiego 4, PL-61614 Poznan, Poland
来源
ELECTRONIC JOURNAL OF STATISTICS | 2022年 / 16卷 / 02期
关键词
Equality of distribution; hypothesis testing; maximum mean discrepancy; three-cumulant matched chi-square approxi-mation; two-sample problem;
D O I
10.1214/22-EJS2033
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
This article develops statistical methods for testing the equality of two distributions based on two independent samples generated in some separable metric space. Such methods are broadly applicable in identifying similarity or distinction of two complicated data sets (e.g., high-dimensional data or functional data) collected in a wide range of research or industry areas, including biology, bioinformatics, medicine, material science, among others. Recently a so-called maximum mean discrepancy (MMD) based ap-proach for the above two-sample problem has been proposed, resulting in several interesting tests. However, the main theoretical , numerical re-sults of these MMD based tests depend on the very restricted assumption that the two samples have equal sample sizes. In addition, these tests are generally implemented via permutation when the equal sample size assump-tion is violated. In real data analysis, this equal sample size assumption is hardly satisfied , dropping away some of the observations often means the loss of priceless information. It is also of interest to know if an MMD-based test can be conducted generally without using permutation. In this paper, we further study this MMD based approach with the equal sample size assumption removed. We establish the asymptotic null and alternative distributions of the MMD test statistic and its root -n consistency. We pro-pose methods for approximating the null distribution, resulting in easy and quick implementation. Numerical experiments based on artificial data and two real data sets from two different areas of applications demonstrate that in terms of control of the type I error level and power, the resulting tests perform better or no worse than several existing competitors.
引用
收藏
页码:4090 / 4132
页数:43
相关论文
共 15 条
  • [1] A new maximum mean discrepancy based two-sample test for equal distributions in separable metric spaces
    Zhou, Bu
    Ong, Zhi Peng
    Zhang, Jin-Ting
    [J]. STATISTICS AND COMPUTING, 2024, 34 (05)
  • [2] Two-sample test based on maximum variance discrepancy
    Makigusa, N.
    [J]. COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2024, 53 (15) : 5421 - 5438
  • [3] New two-sample test utilizing interpoint distance discrepancy
    Xu, Dong
    [J]. STAT, 2024, 13 (03):
  • [4] Two-sample high dimensional mean test based on prepivots
    Ghosh, Santu
    Ayyala, Deepak Nag
    Hellebuyck, Rafael
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2021, 163
  • [5] Testing equality of several distributions in separable metric spaces: A maximum mean discrepancy based approach
    Zhang, Jin-Ting
    Guo, Jia
    Zhou, Bu
    [J]. JOURNAL OF ECONOMETRICS, 2024, 239 (02)
  • [6] A hyperbolic divergence based nonparametric test for two-sample multivariate distributions
    Wang, Roulin
    Fan, Wei
    Wang, Xueqin
    [J]. CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2023, 51 (04): : 1034 - 1054
  • [7] A k-Sample Test for Functional Data Based on Generalized Maximum Mean Discrepancy
    Armando Sosthène Kali Balogoun
    Guy Martial Nkiet
    Carlos Ogouyandjou
    [J]. Lithuanian Mathematical Journal, 2022, 62 : 289 - 303
  • [8] A k-Sample Test for Functional Data Based on Generalized Maximum Mean Discrepancy
    Balogoun, Armando Sosthene Kali
    Nkiet, Guy Martial
    Ogouyandjou, Carlos
    [J]. LITHUANIAN MATHEMATICAL JOURNAL, 2022, 62 (03) : 289 - 303
  • [9] A new test for two-sample location problem based on empirical distribution function
    Mathur, S. K.
    Sakate, D. M.
    [J]. COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2017, 46 (24) : 12345 - 12355
  • [10] A New Graph-Based Two-Sample Test for Multivariate and Object Data
    Chen, Hao
    Friedman, Jerome H.
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2017, 112 (517) : 397 - 409