FastMMD: Ensemble of Circular Discrepancy for Efficient Two-Sample Test

被引:33
|
作者
Zhao, Ji [1 ]
Meng, Deyu [2 ]
机构
[1] Carnegie Mellon Univ, Inst Robot, Pittsburgh, PA 15213 USA
[2] Xi An Jiao Tong Univ, Sch Math & Stat, Xian 710049, Peoples R China
关键词
STATISTICS;
D O I
10.1162/NECO_a_00732
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The maximum mean discrepancy (MMD) is a recently proposed test statistic for the two-sample test. Its quadratic time complexity, however, greatly hampers its availability to large-scale applications. To accelerate the MMD calculation, in this study we propose an efficient method called FastMMD. The core idea of FastMMD is to equivalently transform the MMD with shift-invariant kernels into the amplitude expectation of a linear combination of sinusoid components based on Bochner's theorem and Fourier transform (Rahimi & Recht, 2007). Taking advantage of sampling the Fourier transform, FastMMD decreases the time complexity for MMD calculation from O(N-2 d) to O(LN d), where N and d are the size and dimension of the sample set, respectively. Here, L is the number of basis functions for approximating kernels that determines the approximation accuracy. For kernels that are spherically invariant, the computation can be further accelerated to O(LN log d) by using the Fastfood technique (Le, Sarlos, & Smola, 2013). The uniform convergence of our method has also been theoretically proved in both unbiased and biased estimates. We also provide a geometric explanation for our method, ensemble of circular discrepancy, which helps us understand the insight of MMD and we hope will lead to more extensive metrics for assessing the two-sample test task. Experimental results substantiate that the accuracy of FastMMD is similar to that of MMD and with faster computation and lower variance than existing MMD approximation methods.
引用
收藏
页码:1345 / 1372
页数:28
相关论文
共 50 条
  • [31] Dominance refinements of the Smirnov two-sample test
    Di Bucchianico, A
    Loeb, DE
    [J]. JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 1998, 66 (01) : 51 - 60
  • [32] Two-sample test of stochastic block models
    Wu, Qianyong
    Hu, Jiang
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2024, 192
  • [33] On the Two-Sample Randomisation Test for IR Evaluation
    Sakai, Tetsuya
    [J]. SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 1980 - 1984
  • [34] A two-sample nonparametric test with missing observations
    Lee, YJ
    [J]. AMERICAN JOURNAL OF MATHEMATICAL AND MANAGEMENT SCIENCES, VOL 17, NOS 1 AND 2, 1997: MULTIVARIATE STATISTICAL INFERENCE - MSI-2000L MULTIVARIATE STATISTICAL ANALYSIS IN HONOR OF PROFESSOR MINORU SIOTANI ON HIS 70TH BIRTHDAY, 1997, 17 (1&2): : 187 - 200
  • [35] A Kernel Two-Sample Test for Functional Data
    Wynne, George
    Duncan, Andrew B.
    [J]. Journal of Machine Learning Research, 2022, 23 : 1 - 51
  • [36] The new and improved two-sample t test
    Keselman, HJ
    Othman, AR
    Wilcox, RR
    Fradette, K
    [J]. PSYCHOLOGICAL SCIENCE, 2004, 15 (01) : 47 - 51
  • [37] A nonparametric test for the general two-sample problem
    Baumgartner, W
    Weiss, P
    Schindler, H
    [J]. BIOMETRICS, 1998, 54 (03) : 1129 - 1135
  • [38] A two-sample test when data are contaminated
    Denys Pommeret
    [J]. Statistical Methods & Applications, 2013, 22 : 501 - 516
  • [39] Two-sample test for equal distributions in separate metric space: New maximum mean discrepancy based approaches
    Zhang, Jin-Ting
    Smaga, Lukasz
    [J]. ELECTRONIC JOURNAL OF STATISTICS, 2022, 16 (02): : 4090 - 4132
  • [40] Maximum test and adaptive test for the general two-sample problem
    Murakami, Hidetoshi
    Kitani, Masato
    Neuhaeuser, Markus
    [J]. JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2024, 94 (09) : 1874 - 1897