Universal compression of memoryless sources over unknown alphabets

被引:94
|
作者
Orlitsky, A [1 ]
Santhanam, NP
Zhang, JA
机构
[1] Univ Calif San Diego, Dept Elect & Comp Engn, La Jolla, CA 92093 USA
[2] Univ Calif San Diego, Dept Comp Sci & Engn, La Jolla, CA 92093 USA
基金
美国国家科学基金会;
关键词
large and unknown alphabets; patterns; set and integer partitions; universal compression;
D O I
10.1109/TIT.2004.830761
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
It has long been known that the compression redundancy of independent and identically distributed (i.i.d.) strings increases to infinity as the alphabet size grows. It is also apparent that any string can be described by separately conveying its symbols, and its pattern-the order in which the symbols appear. Concentrating on the latter, we show that the patterns of i.i.d. strings over all, including infinite and even unknown, alphabets, can be compressed with diminishing redundancy, both in block and sequentially, and that the compression can be performed in linear time. To establish these results, we show that the number of patterns is the Bell number, that the number of patterns with a given number of symbols is the Stirling number of the second kind, and that the redundancy of patterns can be bounded using results of Hardy and Ramanujan on the number of integer partitions. The results also imply an asymptotically optimal solution for the Good-Turing probability-estimation problem.
引用
收藏
页码:1469 / 1481
页数:13
相关论文
共 50 条
  • [1] Universal compression of memoryless sources over unknown alphabets
    Orlitsky, Alon
    Santhanam, Narayana P.
    Zhang, Junan
    [J]. IEEE Trans. Inf. Theory, 7 (1469-1481):
  • [2] Universal Compression of Memoryless Sources over Large Alphabets via Independent Component Analysis
    Painsky, Amichai
    Rosset, Saharon
    Feder, Meir
    [J]. 2015 DATA COMPRESSION CONFERENCE (DCC), 2015, : 213 - 222
  • [3] Universal compression of unknown alphabets
    Jevtic, N
    Orlitsky, A
    Santhanam, N
    [J]. ISIT: 2002 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY, PROCEEDINGS, 2002, : 320 - 320
  • [4] Universal Coding for Memoryless Sources with Countably Infinite Alphabets
    Kudryashov, B. D.
    Porov, A. V.
    [J]. PROBLEMS OF INFORMATION TRANSMISSION, 2014, 50 (04) : 390 - 399
  • [5] Universal coding for memoryless sources with countably infinite alphabets
    B. D. Kudryashov
    A. V. Porov
    [J]. Problems of Information Transmission, 2014, 50 : 390 - 399
  • [6] Universal compression of Markov and related sources over arbitrary alphabets
    Dhulipala, Anand K.
    Orlitsky, Alon
    [J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 2006, 52 (09) : 4182 - 4190
  • [7] On Redundancy of Memoryless Sources Over Countable Alphabets
    Hosseini, Maryam
    Santhanam, Narayana
    [J]. 2014 INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY AND ITS APPLICATIONS (ISITA), 2014, : 299 - 303
  • [8] Universal compression for IID sources with large alphabets
    Shamir, GI
    [J]. 2003 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY - PROCEEDINGS, 2003, : 24 - 24
  • [9] On Universal D-Semifaithful Coding for Memoryless Sources With Infinite Alphabets
    Silva, Jorge F.
    Piantanida, Pablo
    [J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 2022, 68 (04) : 2782 - 2800
  • [10] Universal lossless compression with unknown alphabets - The average case
    Shamir, Gil I.
    [J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 2006, 52 (11) : 4915 - 4944