Testing Conditional Independence of Discrete Distributions

被引：0

作者：

Canonne, Clement L. ^{[1
]}

Diakonikolas, Ilias ^{[2
]}

Kane, Daniel M. ^{[3
]}

Stewart, Alistair ^{[2
]}

机构：

[1] Stanford Univ, Stanford, CA 94305 USA

[2] Univ Southern Calif, Los Angeles, CA 90007 USA

[3] Univ Calif San Diego, La Jolla, CA 92093 USA

来源：

2018 INFORMATION THEORY AND APPLICATIONS WORKSHOP (ITA) | 2018年

关键词：

INFORMATION;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We study the problem of testing conditional independence for discrete distributions. Specifically, given samples from a discrete random variable (X, Y, Z) on domain [l(1)] x [l(2)] x [n], we want to distinguish, with probability at least 2/3, between the case that X and Y are conditionally independent given Z from the case that (X, Y, Z) is epsilon-far, in l(1)-distance, from every distribution that has this property. Conditional independence is a concept of central importance in probability and statistics with a range of applications in various scientific domains. As such, the statistical task of testing conditional independence has been extensively studied in various forms within the statistics and econometrics communities for nearly a century. Perhaps surprisingly, this problem has not been previously considered in the framework of distribution property testing and in particular no tester with sublinear sample complexity is known, even for the important special case that the domains of X and Y are binary. The main algorithmic result of this work is the first conditional independence tester with sublinear sample complexity for discrete distributions over [l(1)] x [l(2)] x [n]. To complement our upper bounds, we prove information-theoretic lower bounds establishing that the sample complexity of our algorithm is optimal, up to constant factors, for a number of settings. Specifically, for the prototypical setting when l(1), l(2) = O(1), we show that the sample complexity of testing conditional independence (upper bound and matching lower bound) is Theta(max(n(1/2)/epsilon(2), min(n(7/8)/epsilon, n(6/7)/epsilon(8/7)))). To obtain our tester, we employ a variety of tools, including (1) a suitable weighted adaptation of the flattening technique [DK16], and (2) the design and analysis of an optimal (unbiased) estimator for the following statistical problem of independent interest: Given a degree-d polynomial Q: R-n -> R and sample access to a distribution p over [n], estimate Q(p(1), ... , p(n)) up to small additive error. Obtaining tight variance analyses for specific estimators of this form has been a major technical hurdle in distribution testing (see, e.g., [CDVV14]). As an important contribution of this work, we develop a general theory providing tight variance bounds for all such estimators. Our lower bounds, established using the mutual information method, rely on novel constructions of hard instances that may be useful in other settings.

引用

页数：58

共 50 条

[1] Testing Conditional Independence of Discrete Distributions
Canonne, Clement L.
Diakonikolas, Ilias
Kane, Daniel M.
Stewart, Alistair
[J]. STOC'18: PROCEEDINGS OF THE 50TH ANNUAL ACM SIGACT SYMPOSIUM ON THEORY OF COMPUTING, 2018, : 735 - 748
[2] Testing Conditional Independence on Discrete Data using Stochastic Complexity
Marx, Alexander
Vreeken, Jilles
[J]. 22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89 : 496 - 505
[3] Distributions of a General Reduced-Order Dependence Measure and Conditional Independence Testing
Kubkowski, Mariusz
Lazecka, Malgorzata
Mielniczuk, Jan
[J]. COMPUTATIONAL SCIENCE - ICCS 2020, PT VII, 2020, 12143 : 692 - 706
[4] Nonparametric tests for conditional independence using conditional distributions
Bouezmarni, Taoufik
Taamouti, Abderrahim
[J]. JOURNAL OF NONPARAMETRIC STATISTICS, 2014, 26 (04) : 697 - 719
[5] TESTING CONDITIONAL INDEPENDENCE RESTRICTIONS
Linton, Oliver
Gozalo, Pedro
[J]. ECONOMETRIC REVIEWS, 2014, 33 (5-6) : 523 - 552
[6] A　Characterization　of　Exponential　Distributions　through　Conditional　Independence
黄任燕
寿纪伦
[J]. Chinese Quarterly Journal of Mathematics, 1995, (02) : 45 - 47
[7] Testing for independence in arbitrary distributions
Genest, C.
Neslehova, J. G.
Remillard, B.
Murphy, O. A.
[J]. BIOMETRIKA, 2019, 106 (01) : 47 - 68
[8] A　Characterization　of　Geometric　Distributions　Through　Conditional　Independence
黄任燕
[J]. Chinese Quarterly Journal of Mathematics, 1994, (02) : 54 - 59
[9] Conditional Independence in Testing Bayesian Networks
Shen, Yujia
Huang, Haiying
Choi, Arthur
Darwiche, Adnan
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
[10] Normalizing flows for conditional independence testing
Duong, Bao
Nguyen, Thin
[J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2024, 66 (01) : 357 - 380

← 1 2 3 4 5 →