A Robust Join Operator to Process Streaming Data in Real-Time Data Warehousing

被引:0
|
作者
Naeem, M. Asif [1 ]
机构
[1] Auckland Univ Technol, Sch Comp & Math Sci, Auckland, New Zealand
关键词
Real-time data warehousing; Semi-stream processing; Join operator; Performance measurement;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the field of real-time data warehousing semi-stream processing has become a potential area of research since last one decade. One important operation in semi-stream processing is to join stream data with a slowly changing disk-based master data. A join operator is usually required to implement this operation. This join operator typically works under limited main memory and this memory is generally not large enough to hold the whole disk-based master data. Recently, a seminal join algorithm called MESHJOIN (Mesh Join) has been proposed in the literature to process semi-stream data. MESHJOIN is a candidate for a resource-aware system setup. However, MESHJOIN is not very selective. In particular, MESHJOIN does not consider the characteristics of stream data and its performance is suboptimal for skewed stream data. In this paper we propose a novel Semi-Stream Join (SSJ) using a new cache module. The algorithm is more appropriate for skewed distributions, and we present results for Zipfian distributions of the type that appears in many applications. We conduct a rigorous experimental study to test our algorithm. Our experiments show that SSJ outperforms MESHJOIN significantly. We also present the cost model for our SSJ and validate it with experiments.
引用
收藏
页码:119 / 124
页数:6
相关论文
共 50 条
  • [21] Unsupervised real-time anomaly detection for streaming data
    Ahmad, Subutai
    Lavin, Alexander
    Purdy, Scott
    Agha, Zuha
    [J]. NEUROCOMPUTING, 2017, 262 : 134 - 147
  • [22] Interactive Data Cleaning for Real-Time Streaming Applications
    Raeth, Timo
    Onah, Ngozichukwuka
    Sattler, Kai-Uwe
    [J]. WORKSHOP ON HUMAN-IN-THE-LOOP DATA ANALYTICS, HILDA 2023, 2023,
  • [23] Management of real-time streaming data grid services
    Fox, G
    Aydin, G
    Gadgil, H
    Pallickara, S
    Pierce, M
    Wu, WJ
    [J]. GRID AND COOPERATIVE COMPUTING - GCC 2005, PROCEEDINGS, 2005, 3795 : 3 - 12
  • [24] Streaming Data Movement for Real-Time Image Analysis
    Lopez-Lagunas, Abelardo
    Chai, Sek
    [J]. JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2011, 62 (01): : 29 - 42
  • [25] Real-Time Spread Burst Detection in Data Streaming
    Wang H.
    Melissourgos D.
    Ma C.
    Chen S.
    [J]. Performance Evaluation Review, 2023, 51 (01): : 51 - 52
  • [26] Streaming Data Movement for Real-Time Image Analysis
    Abelardo López-Lagunas
    Sek Chai
    [J]. Journal of Signal Processing Systems, 2011, 62 : 29 - 42
  • [27] Research on a real-time receiving scheme of streaming data
    Zhang X.
    Liu Z.
    Du X.
    Lu T.
    [J]. Tongxin Xuebao/Journal on Communications, 2022, 43 (04): : 154 - 163
  • [28] Data Warehousing Massive Real-time Elevator Signals and Maintenance Records
    Yang, Yi-Yang
    Si, Yain-Whar
    Leong, Wai-Leong
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL TECHNOLOGY, VOLS 1-5, 2008, : 1260 - 1267
  • [29] A rewrite/merge approach for supporting real-time data warehousing via lightweight data integration
    Alfredo Cuzzocrea
    Nickerson Ferreira
    Pedro Furtado
    [J]. The Journal of Supercomputing, 2020, 76 : 3898 - 3922
  • [30] A Spanning Tree based Data Collection for Real-Time Streaming Sensor Data
    Kim, Kyung Tae
    Park, Jong Chang
    Kim, Manyun
    Kim, Ung Mo
    Youn, Hee Yong
    [J]. 2014 IEEE 12TH INTERNATIONAL CONFERENCE ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING (DASC)/2014 IEEE 12TH INTERNATIONAL CONFERENCE ON EMBEDDED COMPUTING (EMBEDDEDCOM)/2014 IEEE 12TH INTERNATIONAL CONF ON PERVASIVE INTELLIGENCE AND COMPUTING (PICOM), 2014, : 202 - 207