Segmentation of Multivariate mixed data via lossy data coding and compression

被引:9
|
作者
Ma, Yi
Derksen, Harm
Hong, Wei
Wright, John
机构
[1] Univ Illinois, Dept Elect & Comp Engn, Coordinated Sci Lab, Urbana, IL 61801 USA
[2] Univ Michigan, Dept Math, Ann Arbor, MI 48109 USA
[3] Texas Instruments Inc, DSP Solut Res & Dev Ctr, Dallas, TX 75266 USA
基金
美国国家科学基金会;
关键词
multivariate mixed data; data segmentation; data clustering; rate distortion; lossy coding; lossy compression; image segmentation; microarray data clustering;
D O I
10.1109/TPAMI.2007.1085
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, based on ideas from lossy data coding and compression, we present a simple but effective technique for segmenting multivariate mixed data that are drawn from a mixture of Gaussian distributions, which are allowed to be almost degenerate. The goal is to find the optimal segmentation that minimizes the overall coding length of the segmented data, subject to a given distortion. By analyzing the coding length/rate of mixed data, we formally establish some strong connections of data segmentation to many fundamental concepts in lossy data compression and rate-distortion theory. We show that a deterministic segmentation is approximately the (asymptotically) optimal solution for compressing mixed data. We propose a very simple and effective algorithm that depends on a single parameter, the allowable distortion. At any given distortion, the algorithm automatically determines the corresponding number and dimension of the groups and does not involve any parameter estimation. Simulation results reveal intriguing phase-transition-like behaviors of the number of segments when changing the level of distortion or the amount of outliers. Finally, we demonstrate how this technique can be readily applied to segment real imagery and bioinformatic data.
引用
收藏
页码:1546 / 1562
页数:17
相关论文
共 50 条
  • [1] Segmentation of multivariate mixed data via lossy coding and compression
    Derksen, Harm
    Ma, Yi
    Hong, Wei
    Wright, John
    VISUAL COMMUNICATIONS AND IMAGE PROCESSING 2007, PTS 1 AND 2, 2007, 6508
  • [2] Unsupervised segmentation of natural images via lossy data compression
    Yang, Allen Y.
    Wright, John
    Ma, Yi
    Sastry, S. Shankar
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2008, 110 (02) : 212 - 225
  • [3] Transform coding techniques for lossy hyperspectral data compression
    Penna, Barbara
    Tillo, Tammam
    Magli, Enrico
    Olmo, Gabriella
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2007, 45 (05): : 1408 - 1421
  • [4] A coding theorem for lossy data compression by LDPC codes
    Matsunaga, Y
    Yamamoto, H
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2003, 49 (09) : 2225 - 2229
  • [5] A coding theorem for lossy data compression by LDPC codes
    Matsunaga, Y
    Yamamoto, H
    ISIT: 2002 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY, PROCEEDINGS, 2002, : 461 - 461
  • [6] Pointwise redundancy in lossy data compression and universal lossy data compression
    Kontoyiannis, I
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2000, 46 (01) : 136 - 152
  • [7] Mismatched codebooks and the role of entropy coding in lossy data compression
    Kontoyiannis, I
    Zamir, R
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2006, 52 (05) : 1922 - 1938
  • [8] Lossless coding and compression of mixed data types
    Stearns, SD
    McDonald, TS
    THIRTY-FIRST ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, VOLS 1 AND 2, 1998, : 1451 - 1454
  • [9] LFZip: Lossy compression of multivariate floating-point time series data via improved prediction
    Chandak, Shubham
    Tatwawadi, Kedar
    Wen, Chengtao
    Wang, Lingyun
    Ojea, Juan Aparicio
    Weissman, Tsachy
    2020 DATA COMPRESSION CONFERENCE (DCC 2020), 2020, : 342 - 351
  • [10] Mismatched codebooks and the role of entropy-coding in lossy data compression
    Kontoyiannis, I
    Zamir, R
    2003 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY - PROCEEDINGS, 2003, : 167 - 167