HYBRIDJOIN for Near-Real-Time Data Warehousing

被引:13
|
作者
Naeem, M. Asif [1 ]
Dobbie, Gillian [1 ]
Weber, Gerald [1 ]
机构
[1] Univ Auckland, Dept Comp Sci, Auckland 1, New Zealand
关键词
Data Transformation; Data Warehousing; Near-Real-Time; Performance and Tuning; JOIN;
D O I
10.4018/jdwm.2011100102
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
An important component of near-real-time data warehouses is the near-real-time integration layer. One important element in near-real-time data integration is the join of a continuous input data stream with a disk-based relation. For high-throughput streams, stream-based algorithms, such as Mesh Join (MESHJOIN), can be used. However, in MESHJOIN the performance of the algorithm is inversely proportional to the size of disk-based relation. The Index Nested Loop Join (INLJ) can be set up so that it processes stream input, and can deal with intermittences in the update stream but it has low throughput. This paper introduces a robust stream-based join algorithm called Hybrid Join (HYBRIDJOIN), which combines the two approaches. A theoretical result shows that HYBRIDJOIN is asymptotically as fast as the fastest of both algorithms. The authors present performance measurements of the implementation. In experiments using synthetic data based on a Zipfian distribution, HYBRIDJOIN performs significantly better for typical parameters of the Zipfian distribution, and in general performs in accordance with the theoretical model while the other two algorithms are unacceptably slow under different settings.
引用
收藏
页码:21 / 42
页数:22
相关论文
共 50 条
  • [1] X-HYBRIDJOIN for Near-Real-Time Data Warehousing
    Naeem, Muhammad Asif
    Dobbie, Gillian
    Weber, Gerald
    [J]. ADVANCES IN DATABASES, 2011, 7051 : 33 - 47
  • [2] Efficient Usage of Memory Resources in Near-Real-Time Data Warehousing
    Naeem, Muhammad Asif
    Dobbie, Gillian
    Weber, Gerald
    Bajwa, Imran Sarwar
    [J]. EMERGING TRENDS AND APPLICATIONS IN INFORMATION COMMUNICATION TECHNOLOGIES, 2012, 281 : 326 - +
  • [3] Efficient processing of streaming updates with archived master data in near-real-time data warehousing
    Naeem, M. Asif
    Dobbie, Gillian
    Weber, Gerald
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2014, 40 (03) : 615 - 637
  • [4] Efficient processing of streaming updates with archived master data in near-real-time data warehousing
    M. Asif Naeem
    Gillian Dobbie
    Gerald Weber
    [J]. Knowledge and Information Systems, 2014, 40 : 615 - 637
  • [5] TinyLFU-based semi-stream cache join for near-real-time data warehousing
    M. Asif Naeem
    Wasiullah Waqar
    Farhaan Mirza
    Ali Tahir
    [J]. Soft Computing, 2022, 26 : 11091 - 11103
  • [6] TinyLFU-based semi-stream cache join for near-real-time data warehousing
    Naeem, M. Asif
    Waqar, Wasiullah
    Mirza, Farhaan
    Tahir, Ali
    [J]. SOFT COMPUTING, 2022, 26 (20) : 11091 - 11103
  • [7] Near-real-time applications of CloudSat Data
    Mitrescu, Cristian
    Miller, Steven
    Hawkins, Jeffrey
    L'Ecuyer, Tristan
    Turk, Joseph
    Partain, Philip
    Stephens, Graeme
    [J]. JOURNAL OF APPLIED METEOROLOGY AND CLIMATOLOGY, 2008, 47 (07) : 1982 - 1994
  • [8] An introduction to the near-real-time QuikSCAT data
    Hoffman, RN
    Leidner, SM
    [J]. WEATHER AND FORECASTING, 2005, 20 (04) : 476 - 493
  • [9] Towards Near Real-Time Data Warehousing
    Chen, Li
    Rahayu, Wenny
    Taniar, David
    [J]. 2010 24TH IEEE INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS (AINA), 2010, : 1150 - 1157
  • [10] Near-real-time coastal oceanographic data products
    Corson, WD
    Sabol, MA
    [J]. OCEANS '96 MTS/IEEE, CONFERENCE PROCEEDINGS, VOLS 1-3 / SUPPLEMENTARY PROCEEDINGS: COASTAL OCEAN - PROSPECTS FOR THE 21ST CENTURY, 1996, : 790 - 793