DuoSQL: towards elastic data warehousing via separated data management and processing

被引:0
|
作者
Zhang, Weikang [1 ]
Liu, Zhi [2 ]
Bai, Tongxin [3 ]
Zheng, Furong [4 ]
Jin, Wenming [4 ]
Wang, Yang [2 ]
机构
[1] Southern Univ Sci & Technol, Coll Engn, Shenzhen 518055, Guangdong, Peoples R China
[2] Chinese Acad Sci, Shenzhen Inst Adv Technol, Shenzhen 518000, Guangdong, Peoples R China
[3] Beijing Acad Artificial Intelligence, Beijing 100084, Peoples R China
[4] SIAT Suntang Big Data & AI Joint Innovat Lab, Shenzhen 518052, Guangdong, Peoples R China
来源
关键词
TUNING SYSTEM; AWARE;
D O I
10.1093/comjnl/bxaf014
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Moving data warehouses (DWs) to the cloud is what today's companies consider a trend towards cost-effective data management. To fully achieve the goal, the cloud DW system is supposed to adjust its resource provisioning to adapt to changing workload requirements. However, traditional data warehousing architecture lacks the flexibility for on-demand resource control, which severely restricts cost optimization and quality of service for both cloud providers and users. To build cloud DWs, new architectures are needed. This paper explores an architecture that decouples data management and processing to enable on-demand resource control. This optimized design enhances system elasticity and adaptability. However, this separation design is not without cost, as cooperation overhead can be high if not well optimized. For proof of concept, we build a prototype system, DuoSQL, using PostgreSQL for data management and Spark for data processing. To optimize cooperation, we conduct joint parameter tuning to improve overall system performance. We validate the system with the TPC-H benchmark. Results show the decoupling approach is flexible and offers significant performance potential.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] Challenges Towards Elastic Power Management in Internet Data Centers
    Liu, Jie
    Zhao, Feng
    Liu, Xue
    He, Wenbo
    ICDCS: 2009 INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS WORKSHOPS, 2009, : 65 - +
  • [23] mBalloon: Enabling Elastic Memory Management for Big Data Processing
    Chen, Wei
    Pi, Aidi
    Rao, Jia
    Zhou, Xiaobo
    PROCEEDINGS OF THE 2017 SYMPOSIUM ON CLOUD COMPUTING (SOCC '17), 2017, : 654 - 654
  • [24] Towards Hierarchical Autonomous Control for Elastic Data Stream Processing in the Fog
    Cardellini, Valeria
    Lo Presti, Francesco
    Nardelli, Matteo
    Russo, Gabriele Russo
    EURO-PAR 2017: PARALLEL PROCESSING WORKSHOPS, 2018, 10659 : 106 - 117
  • [25] Metadata management for data warehousing: Between vision and reality
    Vaduva, A
    Dittrich, KR
    2001 INTERNATIONAL DATABASE ENGINEERING & APPLICATIONS SYMPOSIUM, PROCEEDINGS, 2001, : 129 - 135
  • [26] The Yin and Yang of Processing Data Warehousing Queries on GPU Devices
    Yuan, Yuan
    Lee, Rubao
    Zhang, Xiaodong
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2013, 6 (10): : 817 - 828
  • [27] The use of data warehousing in strategic management at the University of the Witwatersrand
    Hadebe, MA
    6TH WORLD MULTICONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL XVIII, PROCEEDINGS: INFORMATION SYSTEMS, CONCEPTS AND APPLICATIONS OF SYSTEMICS, CYBERNETICS AND INFORMATICS, 2002, : 257 - 262
  • [28] Data warehousing methods and processing infrastructure for brain recovery research
    Gee, T.
    Kenny, S.
    Price, C. J.
    Seghier, M. L.
    Small, S. L.
    Leff, A. P.
    Pacurar, A.
    Strother, S. C.
    ARCHIVES ITALIENNES DE BIOLOGIE, 2010, 148 (03): : 207 - 217
  • [29] Model driven data warehousing for business performance management
    Chowdhary, Pawan
    Mihaila, George
    Lei, Hui
    ICEBE 2006: IEEE INTERNATIONAL CONFERENCE ON E-BUSINESS ENGINEERING, PROCEEDINGS, 2006, : 483 - 486
  • [30] Only separated data management in the future
    Polak, Olaf
    BWK, 2007, 59 (03): : 8 - 9