Leveraging 24/7 Availability and Performance for Distributed Real-Time Data Warehouses

被引:2
|
作者
Santos, Ricardo Jorge [1 ]
Bernardino, Jorge [2 ]
Vieira, Marco [1 ]
机构
[1] Univ Coimbra, FCTUC, DEI, CISUC, Coimbra, Portugal
[2] Polytechn Inst Coimbra, CISUC, DEIS, ISEC, Coimbra, Portugal
关键词
Real-time data warehousing; availability; fault tolerance; data replication and redundancy; distributed and parallel databases; load balancing; performance optimization;
D O I
10.1109/COMPSAC.2012.92
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Real-time Data Warehouses (DWs) must be able to deal with continuous updates while ensuring 24/7 availability. To improve their performance, distributing data using round-robin algorithms on clusters of shared-nothing machines is normally used. This paper proposes a solution for distributed DW databases that ensures its continuous availability and deals with frequent data loading requirements, while adding small performance overhead. We use a data striping and replication architecture to distribute portions of each fact table among pairs of slave nodes, where each slave node is an exact replica of its partner. This allows balancing query execution and replacing any defective node, ensuring the system's continuous availability. The size of each portion in a given node depends on its individual features, namely performance benchmark measures and dedicated database RAM. The estimated cost for executing each query workload in each slave node is also used for balancing query performance. We include experiments using the TPC-H decision support benchmark to evaluate the scalability of the proposed solution and show that it outperforms standard round-robin distributed DW setups.
引用
收藏
页码:654 / 659
页数:6
相关论文
共 50 条
  • [1] Query optimisation in real-time data warehouses
    Hamdi, Issam
    Bouazizi, Emna
    Feki, Jamel
    [J]. International Journal of Intelligent Information and Database Systems, 2019, 12 (04) : 245 - 278
  • [2] Refreshing data warehouses with near real-time updates
    Rahman, Nayem
    [J]. JOURNAL OF COMPUTER INFORMATION SYSTEMS, 2007, 47 (03) : 71 - 80
  • [3] Optimizing reservior performance with real-time distributed temperature data
    [J]. 2001, Society of Petroleum Engineers (SPE) (53):
  • [4] Optimizing reservoir performance with real-time distributed temperature data
    不详
    [J]. JOURNAL OF PETROLEUM TECHNOLOGY, 2001, 53 (10): : 39 - 39
  • [5] Scheduling to Minimize Staleness and Stretch in Real-Time Data Warehouses
    Bateni, MohammadHossein
    Golab, Lukasz
    Hajiaghayi, MohammadTaghi
    Karloff, Howard
    [J]. THEORY OF COMPUTING SYSTEMS, 2011, 49 (04) : 757 - 780
  • [6] Scheduling to Minimize Staleness and Stretch in Real-Time Data Warehouses
    Bateni, MohammadHossein
    Golab, Lukasz
    Hajiaghayi, MohammadTaghi
    Karloff, Howard
    [J]. SPAA'09: PROCEEDINGS OF THE TWENTY-FIRST ANNUAL SYMPOSIUM ON PARALLELISM IN ALGORITHMS AND ARCHITECTURES, 2009, : 29 - 38
  • [7] Dynamic Management of Materialized Views in Real-Time Data Warehouses
    Hamdi, Issam
    Bouazizi, Emna
    Feki, Jamel
    [J]. 2014 6TH INTERNATIONAL CONFERENCE OF SOFT COMPUTING AND PATTERN RECOGNITION (SOCPAR), 2014, : 168 - 173
  • [8] Multi-objective scheduling for real-time data warehouses
    Thiele, Maik
    Bader, Andreas
    Lehner, Wolfgang
    [J]. COMPUTER SCIENCE-RESEARCH AND DEVELOPMENT, 2009, 24 (03): : 137 - 151
  • [9] Scheduling to Minimize Staleness and Stretch in Real-Time Data Warehouses
    MohammadHossein Bateni
    Lukasz Golab
    MohammadTaghi Hajiaghayi
    Howard Karloff
    [J]. Theory of Computing Systems, 2011, 49 : 757 - 780
  • [10] Real-time performance estimation for dynamic, distributed real-time systems
    Huh, EN
    Welch, LR
    Mun, Y
    [J]. COMPUTATIONAL SCIENCE-ICCS 2002, PT III, PROCEEDINGS, 2002, 2331 : 1071 - 1079