Sketching information divergences

被引:10
|
作者
Guha, Sudipto [2 ]
Indyk, Piotr [3 ]
McGregor, Andrew [1 ]
机构
[1] Univ Calif San Diego, Informat Theory & Applicat Ctr, San Diego, CA 92109 USA
[2] Univ Penn, Dept Comp & Informat Sci, Philadelphia, PA 19104 USA
[3] MIT, Comp Sci & Artificial Intelligence Lab, Cambridge, MA 02139 USA
关键词
Information divergences; Data stream model; Sketches; Communication complexity; Approximation algorithms;
D O I
10.1007/s10994-008-5054-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
When comparing discrete probability distributions, natural measures of similarity are not l(p) distances but rather are information divergences such as Kullback-Leibler and Hellinger. This paper considers some of the issues related to constructing small-space sketches of distributions in the data-stream model, a concept related to dimensionality reduction, such that these measures can be approximated from the sketches. Related problems for l(p) distances are reasonably well understood via a series of results by Johnson and Linden-strauss (Contemp. Math. 26:189-206, 1984), Alon et al. (J. Comput. Syst. Sci. 58(1): 137 147, 1999), Indyk (IEEE Symposium on Foundations of Computer Science, pp. 202-208, 2000), and Brinkman and Charikar (IEEE Symposium on Foundations of Computer Science, pp. 514-523, 2003). In contrast, almost no analogous results are known to date about constructing sketches for the information divergences used in statistics and learning theory. Our main result is an impossibility result that shows that no small-space sketches exist for the multiplicative approximation of any commonly used f-divergences and Bregman divergences with the notable exceptions of l(1) and l(2) where small-space sketches exist. We then present data-stream algorithms for the additive approximation of a wide range of information divergences. Throughout, our emphasis is on providing general characterizations.
引用
收藏
页码:5 / 19
页数:15
相关论文
共 50 条
  • [1] Sketching information divergences
    Guha, Sudipto
    Indyk, Piotr
    McGregor, Andrew
    LEARNING THEORY, PROCEEDINGS, 2007, 4539 : 424 - +
  • [2] Sketching information divergences
    Sudipto Guha
    Piotr Indyk
    Andrew McGregor
    Machine Learning, 2008, 72 : 5 - 19
  • [3] Transport information Bregman divergences
    Li W.
    Information Geometry, 2021, 4 (2) : 435 - 470
  • [4] MEASURES OF INFORMATION ASSOCIATED WITH CSISZARS DIVERGENCES
    SALICRU, M
    KYBERNETIKA, 1994, 30 (05) : 563 - 573
  • [5] Proximity Operators of Discrete Information Divergences
    El Gheche, Mireille
    Chierchia, Giovanni
    Pesquet, Jean-Christophe
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2018, 64 (02) : 1092 - 1104
  • [6] Information literacy and IT fluency - Convergences and divergences
    Gibson, Craig
    Arp, Lori
    Woodard, Beth S.
    REFERENCE & USER SERVICES QUARTERLY, 2007, 46 (03) : 23 - +
  • [7] Normalized information-based divergences
    Coeurjolly, J. -F.
    Drouilhet, R.
    Robineau, J. -F.
    PROBLEMS OF INFORMATION TRANSMISSION, 2007, 43 (03) : 167 - 189
  • [8] Normalized information-based divergences
    J. -F. Coeurjolly
    R. Drouilhet
    J. -F. Robineau
    Problems of Information Transmission, 2007, 43 : 167 - 189
  • [9] On divergences and informations in statistics and information theory
    Liese, Friedrich
    Vajda, Igor
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2006, 52 (10) : 4394 - 4412
  • [10] A spatial information retrieval system based on sketching
    Moultazem, Ghazal
    Florence, Sèdes
    CORIA 2010: Actes de la COnference en Recherche d'Information et Applications - Proceedings of the Conference on Information Retrieval and Applications, 2010, : 337 - 348