Kernel two-sample testing is a useful statistical tool in determining whether data samples arise from different distributions without imposing any parametric assumptions on those distributions. However, raw data samples can expose sensitive information about individuals who participate in scientific studies, which makes the current tests vulnerable to privacy breaches. Hence, we design a new framework for kernel two-sample testing conforming to differential privacy constraints, in order to guarantee the privacy of subjects in the data. Unlike existing differentially private parametric tests that simply add noise to data, kernel-based testing imposes a challenge due to a complex dependence of test statistics on the raw data, as these statistics correspond to estimators of distances between representations of probability measures in Hilbert spaces. Our approach considers finite dimensional approximations to those representations. As a result, a simple chi-squared test is obtained, where a test statistic depends on a mean and covariance of empirical differences between the samples, which we perturb for a privacy guarantee. We investigate the utility of our framework in two realistic settings and conclude that our method requires only a relatively modest increase in sample size to achieve a similar level of power to the non-private tests in both settings.
机构:
UCL, Ctr Artificial Intelligence, London WC1V 6LJ, England
UCL, Gatsby Computat Neurosci Unit, Inria London, London WC1V 6LJ, EnglandUCL, Ctr Artificial Intelligence, London WC1V 6LJ, England
Schrab, Antonin
Kim, Ilmun
论文数: 0引用数: 0
h-index: 0
机构:
Yonsei Univ, Dept Stat & Data Sci, Dept Appl Stat, Seoul 03722, South KoreaUCL, Ctr Artificial Intelligence, London WC1V 6LJ, England
Kim, Ilmun
Albert, Melisande
论文数: 0引用数: 0
h-index: 0
机构:
Inst Math Toulouse, Toulouse, France
Univ Toulouse, UMR 5219, Toulouse, France
CNRS, INSA, Paris, FranceUCL, Ctr Artificial Intelligence, London WC1V 6LJ, England
Albert, Melisande
Laurent, Beatrice
论文数: 0引用数: 0
h-index: 0
机构:
Inst Math Toulouse, Toulouse, France
Univ Toulouse, UMR 5219, Toulouse, France
CNRS, INSA, Paris, FranceUCL, Ctr Artificial Intelligence, London WC1V 6LJ, England
Laurent, Beatrice
Guedj, Benjamin
论文数: 0引用数: 0
h-index: 0
机构:
UCL, Ctr Artificial Intelligence, London WC1V 6LJ, England
Inria London, London WC1V 6LJ, EnglandUCL, Ctr Artificial Intelligence, London WC1V 6LJ, England
Guedj, Benjamin
Gretton, Arthur
论文数: 0引用数: 0
h-index: 0
机构:
UCL, Gatsby Computat Neurosci Unit, London W1T 4JG, EnglandUCL, Ctr Artificial Intelligence, London WC1V 6LJ, England
机构:
La Trobe Univ, Dept Math & Stat, Bundoora, Vic 3086, Australia
Univ Melbourne, Sch Math & Stat, Melbourne, Vic 3010, AustraliaLa Trobe Univ, Dept Math & Stat, Bundoora, Vic 3086, Australia
Huggins, R. M.
Morgan, B. J. T.
论文数: 0引用数: 0
h-index: 0
机构:
La Trobe Univ, Dept Math & Stat, Bundoora, Vic 3086, Australia
Univ Kent, Sch Math Stat & Actuarial Sci, Canterbury, Kent, EnglandLa Trobe Univ, Dept Math & Stat, Bundoora, Vic 3086, Australia
机构:
Univ Missouri, Dept Math & Comp Sci, One Univ Blvd, St Louis, MO 63121 USAUniv Missouri, Dept Math & Comp Sci, One Univ Blvd, St Louis, MO 63121 USA
Cai, Haiyan
Goggin, Bryan
论文数: 0引用数: 0
h-index: 0
机构:
Univ Missouri, Dept Math & Comp Sci, One Univ Blvd, St Louis, MO 63121 USAUniv Missouri, Dept Math & Comp Sci, One Univ Blvd, St Louis, MO 63121 USA
Goggin, Bryan
Jiang, Qingtang
论文数: 0引用数: 0
h-index: 0
机构:
Univ Missouri, Dept Math & Comp Sci, One Univ Blvd, St Louis, MO 63121 USAUniv Missouri, Dept Math & Comp Sci, One Univ Blvd, St Louis, MO 63121 USA