A Data Reusing Strategy Based on Column-Stores

被引:0
|
作者
Wang, Mei [1 ]
Zhou, Jiaoling [1 ]
Li, Yue [1 ]
Xia, Xiaoling [1 ]
Le, Jiajin [1 ]
机构
[1] Donghua Univ, Shanghai, Peoples R China
关键词
massive data; data reusing; column-store; schema mapping;
D O I
10.1109/DASC.2013.56
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data reusing is an important way to save storage capacity and improve query efficiency in the management of massive data. The column-store architecture stores data from the same column continuously, which greatly improves the performance of "read optimization" application and moreover increases the feasibility and flexibility of data reusing. In this paper, we propose a novel reusing method based on the column-store data warehouse. Firstly, we propose an improved iMAP method based on the schema mapping technique to generate as more candidate reusable columns as possible and then conduct further filter on these candidate data, which greatly reduces the complexity of reusable data detection. Based on the column-store architecture, we then propose the reuse implement at the storage layer. The method for query execution based on reusable data is provided finally. The experiment results conducted on the real data sets indicate that the presented strategy can reduce the storage space and query execution time efficiently.
引用
收藏
页码:163 / 168
页数:6
相关论文
共 50 条
  • [1] Vectorized UDFs in Column-Stores
    Raasveldt, Mark
    Muhleisen, Hannes
    [J]. 28TH INTERNATIONAL CONFERENCE ON SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT (SSDBM) 2016), 2016,
  • [2] Self-organizing Tuple Reconstruction in Column-stores
    Idreos, Stratos
    Kersten, Martin L.
    Manegold, Stefan
    [J]. ACM SIGMOD/PODS 2009 CONFERENCE, 2009, : 297 - 308
  • [3] Holistic Indexing in Main-memory Column-stores
    Petraki, Eleni
    Idreos, Stratos
    Manegold, Stefan
    [J]. SIGMOD'15: PROCEEDINGS OF THE 2015 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2015, : 1153 - 1166
  • [4] Optimizing Parallel Join of Column-stores on Heterogeneous Computing Platforms
    Ding Xiangwu
    Chen Jinxin
    [J]. 2016 IEEE INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC), 2016, : 621 - 625
  • [5] Bridging the Archipelago between Row-Stores and Column-Stores for Hybrid Workloads
    Arulraj, Joy
    Pavlo, Andrew
    Menon, Prashanth
    [J]. SIGMOD'16: PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2016, : 583 - 598
  • [6] A data reusing strategy in column-store data warehouse
    [J]. Wang, M. (wangmei@dhu.edu.cn), 1626, Science Press (36):
  • [7] Fast Multi-Column Sorting in Main-Memory Column-Stores
    Xu, Wenjian
    Feng, Ziqiang
    Lo, Eric
    [J]. SIGMOD'16: PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2016, : 1263 - 1278
  • [8] Hardware-Oblivious Parallelism for In-Memory Column-Stores
    Heimel, Max
    Saecker, Michael
    Pirk, Holger
    Manegold, Stefan
    Markl, Volker
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2013, 6 (09): : 709 - 720
  • [9] Nimble join: A parallel star join for main memory column-stores
    Sangat, Prajwol
    Taniar, David
    Indrawan-Santiago, Maria
    Messom, Christopher
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2021, 33 (08):
  • [10] A Data Reusing Strategy Based On Hive
    Xie, Heng
    Wang, Mei
    Le, Jiajin
    [J]. 2014 INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA), 2014, : 367 - 373