Sector-based compression and compression strategy selection method for column stores

被引:0
|
作者
Wang Z.-X. [1 ]
Le J.-J. [1 ]
Wang M. [1 ]
Liu G.-H. [1 ,2 ]
机构
[1] School of Computer Science and Technology, Donghua University
[2] State Key Laboratory for Novel Software Technology, Nanjing University
来源
关键词
Column store; Compression strategies; Data compression; Sector-based compression;
D O I
10.3724/SP.J.1016.2010.01523
中图分类号
TP392 [各种专用数据库];
学科分类号
摘要
Compression technology is an important research field in column-oriented management system. However, most previous compression techniques for column-oriented data use same algorithm for all columns, ignoring the local distribution of data, which greatly degrade the compression performance. This paper proposes a sector-based compress pattern, under such pattern further provides a novel learning-based compression strategy selection method for column stores. First, data column is divided into sectors in the method. The neighbor sector information and the statistic information of the column with the given sector respectively are extracted as two references. Then by learning the similarity between the reference and the given sector the recommended compression strategy can be obtained. Finally, the recommended compression strategy is improved by partly learning the given sector to guarantee the effectiveness of it. The experimental results on data warehouse benchmark data set SSB testify the effectiveness of the proposed method.
引用
收藏
页码:1523 / 1530
页数:7
相关论文
共 17 条
  • [1] Idreos S., Et al., Self-organizing tuple reconstruction in column-stores, Proceedings of the SIGMOD, pp. 297-308, (2009)
  • [2] Huffman D., A method for the construction of minimum-redundancy codes, IEEE Transactions on Information Theory, 9, 40, pp. 1098-1101, (1952)
  • [3] Witten I.H., Neal R., Cleary J., Arithmetic coding for data compression, Communications of the ACM, 30, 6, pp. 520-540, (1987)
  • [4] Roth M.A., van Horn S.J., Database compression, ACM SIGMOD Record, 22, 3, pp. 31-39, (1993)
  • [5] Tanaka H., Leon-Garcia A., Efficient run-length encodings, IEEE Transactions on Information Theory, 6, 28, pp. 880-890, (1982)
  • [6] Ziv J., Lempl A., A universal algorithm for sequential data compression, Proceedings of the IEEE Transactions on Information Theory, 22, 1, pp. 337-343, (1977)
  • [7] Abadi D.J., Et al., Query execution in column-oriented database systems, (2008)
  • [8] Stonebraker M., Abadi D.J., Et al., C-store - A column oriented DBMS, Proceedings of the 31st VLDB Conference, pp. 553-564, (2005)
  • [9] Weyla S., Friesb J., Wiederholdc G., Germano F., A modular self-describing clinical databank system, Computers and Biomedical Research, 8, 3, pp. 279-293, (1975)
  • [10] Wong H.K.T., Et al., Bit transposed files, Proceedings of the 11th International Conference on Very Large Data Bases Stockholm, pp. 448-457, (1985)