Sketching information divergences

被引:10
|
作者
Guha, Sudipto [2 ]
Indyk, Piotr [3 ]
McGregor, Andrew [1 ]
机构
[1] Univ Calif San Diego, Informat Theory & Applicat Ctr, San Diego, CA 92109 USA
[2] Univ Penn, Dept Comp & Informat Sci, Philadelphia, PA 19104 USA
[3] MIT, Comp Sci & Artificial Intelligence Lab, Cambridge, MA 02139 USA
关键词
Information divergences; Data stream model; Sketches; Communication complexity; Approximation algorithms;
D O I
10.1007/s10994-008-5054-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
When comparing discrete probability distributions, natural measures of similarity are not l(p) distances but rather are information divergences such as Kullback-Leibler and Hellinger. This paper considers some of the issues related to constructing small-space sketches of distributions in the data-stream model, a concept related to dimensionality reduction, such that these measures can be approximated from the sketches. Related problems for l(p) distances are reasonably well understood via a series of results by Johnson and Linden-strauss (Contemp. Math. 26:189-206, 1984), Alon et al. (J. Comput. Syst. Sci. 58(1): 137 147, 1999), Indyk (IEEE Symposium on Foundations of Computer Science, pp. 202-208, 2000), and Brinkman and Charikar (IEEE Symposium on Foundations of Computer Science, pp. 514-523, 2003). In contrast, almost no analogous results are known to date about constructing sketches for the information divergences used in statistics and learning theory. Our main result is an impossibility result that shows that no small-space sketches exist for the multiplicative approximation of any commonly used f-divergences and Bregman divergences with the notable exceptions of l(1) and l(2) where small-space sketches exist. We then present data-stream algorithms for the additive approximation of a wide range of information divergences. Throughout, our emphasis is on providing general characterizations.
引用
收藏
页码:5 / 19
页数:15
相关论文
共 50 条
  • [21] Parameterization of f - divergences and Associated Information Inequalities
    Srivastava, Amit
    ADVANCEMENT IN MATHEMATICAL SCIENCES, 2017, 1897
  • [22] Law invariant risk measures and information divergences
    Lacker, Daniel
    DEPENDENCE MODELING, 2018, 6 (01): : 228 - 258
  • [23] INFORMATION AND COMMUNICATION TECHNOLOGY AND ASSISTIVE TECHNOLOGY: CONVERGENCES AND DIVERGENCES
    Sardenberg, Thiago
    Maia, Helenice
    REVISTA IBERO-AMERICANA DE ESTUDOS EM EDUCACAO, 2021, 16 : 3072 - 3085
  • [24] On divergences of finite measures and their applicability in statistics and information theory
    Stummer, Wolfgang
    Vajda, Igor
    STATISTICS, 2010, 44 (02) : 169 - 187
  • [25] A Novel Approach to Canonical Divergences within Information Geometry
    Ay, Nihat
    Amari, Shun-ichi
    ENTROPY, 2015, 17 (12): : 8111 - 8129
  • [26] Clustering Positive Definite Matrices by Learning Information Divergences
    Stanitsas, Panagiotis
    Cherian, Anoop
    Morellas, Vassilios
    Papanikolopoulos, Nikolaos
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2017), 2017, : 1304 - 1312
  • [27] Some stochastic HH-divergences in information theory
    Agahi, Hamzeh
    Yadollahzadeh, Milad
    AEQUATIONES MATHEMATICAE, 2018, 92 (06) : 1051 - 1059
  • [28] Multivariate image region segmentation using information divergences
    Hibbard, L
    MEDICAL PHYSICS, 2003, 30 (06) : 1458 - 1458
  • [29] SKETCHING: A WAY TO INTRODUCE PHYSICAL COMPUTING TO ACADEMICS IN INFORMATION TECHNOLOGY
    Alabdulkarim, Arwa
    AlMazrua, Hailah
    INTED2014: 8TH INTERNATIONAL TECHNOLOGY, EDUCATION AND DEVELOPMENT CONFERENCE, 2014, : 3760 - 3769
  • [30] Robust Covariance Estimators Based on Information Divergences and Riemannian Manifold
    Hua, Xiaoqiang
    Cheng, Yongqiang
    Wang, Hongqiang
    Qin, Yuliang
    ENTROPY, 2018, 20 (04)