FastMMD: Ensemble of Circular Discrepancy for Efficient Two-Sample Test

被引：33

作者：

Zhao, Ji ^{[1
]}

Meng, Deyu ^{[2
]}

机构：

[1] Carnegie Mellon Univ, Inst Robot, Pittsburgh, PA 15213 USA

[2] Xi An Jiao Tong Univ, Sch Math & Stat, Xian 710049, Peoples R China

来源：

NEURAL COMPUTATION | 2015年 / 27卷 / 06期

关键词：

STATISTICS;

D O I：

10.1162/NECO_a_00732

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The maximum mean discrepancy (MMD) is a recently proposed test statistic for the two-sample test. Its quadratic time complexity, however, greatly hampers its availability to large-scale applications. To accelerate the MMD calculation, in this study we propose an efficient method called FastMMD. The core idea of FastMMD is to equivalently transform the MMD with shift-invariant kernels into the amplitude expectation of a linear combination of sinusoid components based on Bochner's theorem and Fourier transform (Rahimi & Recht, 2007). Taking advantage of sampling the Fourier transform, FastMMD decreases the time complexity for MMD calculation from O(N-2 d) to O(LN d), where N and d are the size and dimension of the sample set, respectively. Here, L is the number of basis functions for approximating kernels that determines the approximation accuracy. For kernels that are spherically invariant, the computation can be further accelerated to O(LN log d) by using the Fastfood technique (Le, Sarlos, & Smola, 2013). The uniform convergence of our method has also been theoretically proved in both unbiased and biased estimates. We also provide a geometric explanation for our method, ensemble of circular discrepancy, which helps us understand the insight of MMD and we hope will lead to more extensive metrics for assessing the two-sample test task. Experimental results substantiate that the accuracy of FastMMD is similar to that of MMD and with faster computation and lower variance than existing MMD approximation methods.

引用

页码：1345 / 1372

页数：28

共 50 条

[21] A Kernel Two-Sample Test for Functional Data
Wynne, George
Duncan, Andrew B.
[J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2022, 23
[22] The score test for the two-sample occupancy model
Karavarsamis, N.
Guillera-Arroita, G.
Huggins, R. M.
Morgan, B. J. T.
[J]. AUSTRALIAN & NEW ZEALAND JOURNAL OF STATISTICS, 2020, 62 (01) : 94 - 115
[23] Two-Sample Hypothesis Test for Functional Data
Zhao, Jing
Feng, Sanying
Hu, Yuping
[J]. MATHEMATICS, 2022, 10 (21)
[24] A two-sample test when data are contaminated
Pommeret, Denys
[J]. STATISTICAL METHODS AND APPLICATIONS, 2013, 22 (04): : 501 - 516
[25] Two-sample test based on classification probability
Cai, Haiyan
Goggin, Bryan
Jiang, Qingtang
[J]. STATISTICAL ANALYSIS AND DATA MINING, 2020, 13 (01) : 5 - 13
[26] Network two-sample test for block models
Department of Statistics, UCLA, United States
[J]. arXiv,
[27] A two-sample nonparametric likelihood ratio test
Marsh, Patrick
[J]. JOURNAL OF NONPARAMETRIC STATISTICS, 2010, 22 (08) : 1053 - 1065
[28] An adjusted, asymmetric two-sample t test
Balkin, SD
Mallows, CL
[J]. AMERICAN STATISTICIAN, 2001, 55 (03): : 203 - 206
[29] Least-squares two-sample test
Sugiyama, Masashi
Suzuki, Taiji
Itoh, Yuta
Kanamori, Takafumi
Kimura, Manabu
[J]. NEURAL NETWORKS, 2011, 24 (07) : 735 - 751
[30] A Differentially Private Kernel Two-Sample Test
Raj, Anant
Law, Ho Chung Leon
Sejdinovic, Dino
Park, Mijung
[J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2019, PT I, 2020, 11906 : 697 - 724

← 1 2 3 4 5 →