Online Expansion of Large-scale Data Warehouses

被引:0
|
作者
Cohen, Jeffrey [1 ]
Eshleman, John [1 ]
Hagenbuch, Brian [1 ]
Kent, Joy [1 ]
Pedrotti, Christopher [1 ]
Sherry, Gavin [1 ]
Waas, Florian [1 ]
机构
[1] EMC Corp, Data Comp Div, Hopkinton, MA 01748 USA
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2011年 / 4卷 / 12期
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Modern data warehouses store exceedingly large amounts of data, generally considered the crown jewels of an enterprise. The amount of data maintained in such data warehouses increases significantly over time often at a continuous pace, e.g., by gathering additional data or retaining data for longer periods to derive additional business value, but occasionally also precipitously, e.g., when consolidating disparate data warehouses and Data Marts into a single database. Having to expand a data warehouse with 100's of TB of data by a substantial portion, e.g., 100% or more is a complex and disruptive maintenance operation as it typically involves some sort of dumping and reloading of data which requires substantial downtime. In this paper we describe the methodology and mechanisms we developed in Greenplum Database to expand largescale data warehouses in an online fashion, i.e., without noticeable downtime. At the core of our approach is a set of robust and transactionally consistent primitives that enable efficient data movement. Special emphasis was put on usability and control that lets an administrator tailor the expansion process to specific operational characteristics via priorities and schedules. We present a number of experiments to quantify the impact of an on-going expansion on query workloads.
引用
收藏
页码:1249 / 1259
页数:11
相关论文
共 50 条
  • [1] The use of online surveys in capturing large-scale data
    Stenton, J
    Pascoe, J
    [J]. EDUCATING: WEAVING RESEARCH INTO PRACTICE, VOL 3, 2004, : 148 - 157
  • [2] Reliability Design for Large Scale Data Warehouses
    Du, Kai
    Hu, Zhengbing
    Wang, Huaimin
    Chen, Yingwen
    Yang, Shuqiang
    Yuan, Zhijian
    [J]. JOURNAL OF COMPUTERS, 2008, 3 (10) : 78 - 85
  • [3] Adaptive Task Planning for Large-Scale Robotized Warehouses
    Shi, Dingyuan
    Tong, Yongxin
    Zhou, Zimu
    Xu, Ke
    Tan, Wenzhe
    Li, Hongbo
    [J]. 2022 IEEE 38TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2022), 2022, : 3327 - 3339
  • [4] Interactive Audience Expansion On Large Scale Online Visitor Data
    Chan, Gromit Yeuk-Yin
    Mai, Tung
    Rao, Anup B.
    Rossi, Ryan A.
    Du, Fan
    Silva, Claudio T.
    Freire, Juliana
    [J]. KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, : 2621 - 2631
  • [5] Large-scale expansion at IOC
    Ednie, H
    [J]. CIM BULLETIN, 2001, 94 (1048): : 38 - +
  • [6] An adaptive approach for online monitoring of large-scale data streams
    Cao, Shuchen
    Zhang, Ruizhi
    [J]. IISE TRANSACTIONS, 2023,
  • [7] An adaptive approach for online monitoring of large-scale data streams
    Cao, Shuchen
    Zhang, Ruizhi
    [J]. IISE Transactions, 2023,
  • [8] Online Dictionary Learning from Large-Scale Binary Data
    Shen, Yanning
    Giannakis, Georgios B.
    [J]. 2016 24TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2016, : 1808 - 1812
  • [9] Distributed Large-Scale Data Collection in Online Social Networks
    Efstathiades, Hariton
    Antoniades, Demetris
    Pallis, George
    Dikaiakos, Marios D.
    [J]. 2016 IEEE 2ND INTERNATIONAL CONFERENCE ON COLLABORATION AND INTERNET COMPUTING (IEEE CIC), 2016, : 373 - 380
  • [10] Advances in 3D Data Acquisition, Mapping and Localization in Modern Large-Scale Warehouses
    Beinschob, Patric
    Reinke, Christoph
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTER COMMUNICATION AND PROCESSING (ICCP), 2014, : 265 - 271