Refreshing data warehouses with near real-time updates

被引:0
|
作者
Rahman, Nayem [1 ]
机构
[1] Intel Corp, Business Intelligence Serv, Aloha, OR 97002 USA
关键词
data warehouse; near real-time; real-time; observation timestamp; metadata; incremental updates;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In traditional decision support systems, data warehouses have been used to analyze historical information. In the past it was relatively easy to keep data acquisition and maintenance activities to an as-needed basis by using batch windows at night when the business users went home. Now, however, decision makers need up-to-date information to make strategic business decisions, requiring data warehouses to be refreshed several times a day. This paper presents a technical outline for a near real-time decision support system where data warehouses are refreshed using a metadata model and incremental refreshes to increase the frequency of batch cycle runs. We propose a staging area in the data warehouse to capture data updates from external sources. Based on new data in the staging tables, we propose to load the actual analytical tables in the data warehouse using the database system as a transformation engine. We also propose making the database transformation tasks, such as stored procedures execution, metadata driven. The metadata model lets the stored procedures in different business and analytical subject areas run only when source data changes in the source subject area tables, and then implements a delta refresh of tables for which new data has arrived from the operational databases. Skipping unnecessary loads via this metadata-driven approach allows for faster cycle refreshes. The cycle refresh time statistics captured from an actual production data warehouse demonstrate the excellent reductions in cycle times achieved by our batch technique.
引用
收藏
页码:71 / 80
页数:10
相关论文
共 50 条
  • [1] Query optimisation in real-time data warehouses
    Hamdi I.
    Bouazizi E.
    Feki J.
    International Journal of Intelligent Information and Database Systems, 2019, 12 (04) : 245 - 278
  • [2] Scheduling to Minimize Staleness and Stretch in Real-Time Data Warehouses
    Bateni, MohammadHossein
    Golab, Lukasz
    Hajiaghayi, MohammadTaghi
    Karloff, Howard
    THEORY OF COMPUTING SYSTEMS, 2011, 49 (04) : 757 - 780
  • [3] Scheduling to Minimize Staleness and Stretch in Real-Time Data Warehouses
    MohammadHossein Bateni
    Lukasz Golab
    MohammadTaghi Hajiaghayi
    Howard Karloff
    Theory of Computing Systems, 2011, 49 : 757 - 780
  • [4] Multi-objective scheduling for real-time data warehouses
    Thiele, Maik
    Bader, Andreas
    Lehner, Wolfgang
    COMPUTER SCIENCE-RESEARCH AND DEVELOPMENT, 2009, 24 (03): : 137 - 151
  • [5] Scheduling to Minimize Staleness and Stretch in Real-Time Data Warehouses
    Bateni, MohammadHossein
    Golab, Lukasz
    Hajiaghayi, MohammadTaghi
    Karloff, Howard
    SPAA'09: PROCEEDINGS OF THE TWENTY-FIRST ANNUAL SYMPOSIUM ON PARALLELISM IN ALGORITHMS AND ARCHITECTURES, 2009, : 29 - 38
  • [6] Dynamic Management of Materialized Views in Real-Time Data Warehouses
    Hamdi, Issam
    Bouazizi, Emna
    Feki, Jamel
    2014 6TH INTERNATIONAL CONFERENCE OF SOFT COMPUTING AND PATTERN RECOGNITION (SOCPAR), 2014, : 168 - 173
  • [7] Real-time/near real-time recce wideband data links
    Robinson, R.S.
    Proceedings of SPIE - The International Society for Optical Engineering, (154-163):
  • [8] Real-Time Snapshot Maintenance with Incremental ETL Pipelines in Data Warehouses
    Qu, Weiping
    Basavaraj, Vinanthi
    Shankar, Sahana
    Dessloch, Stefan
    BIG DATA ANALYTICS AND KNOWLEDGE DISCOVERY, 2015, 9263 : 217 - 228
  • [9] Copernicus reveals near real-time data
    不详
    ASTRONOMY & GEOPHYSICS, 2018, 59 (06) : 7 - 7
  • [10] Towards Near Real-Time Data Warehousing
    Chen, Li
    Rahayu, Wenny
    Taniar, David
    2010 24TH IEEE INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS (AINA), 2010, : 1150 - 1157