A Robust Join Operator to Process Streaming Data in Real-Time Data Warehousing

被引:0
|
作者
Naeem, M. Asif [1 ]
机构
[1] Auckland Univ Technol, Sch Comp & Math Sci, Auckland, New Zealand
关键词
Real-time data warehousing; Semi-stream processing; Join operator; Performance measurement;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the field of real-time data warehousing semi-stream processing has become a potential area of research since last one decade. One important operation in semi-stream processing is to join stream data with a slowly changing disk-based master data. A join operator is usually required to implement this operation. This join operator typically works under limited main memory and this memory is generally not large enough to hold the whole disk-based master data. Recently, a seminal join algorithm called MESHJOIN (Mesh Join) has been proposed in the literature to process semi-stream data. MESHJOIN is a candidate for a resource-aware system setup. However, MESHJOIN is not very selective. In particular, MESHJOIN does not consider the characteristics of stream data and its performance is suboptimal for skewed stream data. In this paper we propose a novel Semi-Stream Join (SSJ) using a new cache module. The algorithm is more appropriate for skewed distributions, and we present results for Zipfian distributions of the type that appears in many applications. We conduct a rigorous experimental study to test our algorithm. Our experiments show that SSJ outperforms MESHJOIN significantly. We also present the cost model for our SSJ and validate it with experiments.
引用
收藏
页码:119 / 124
页数:6
相关论文
共 50 条
  • [1] Towards Near Real-Time Data Warehousing
    Chen, Li
    Rahayu, Wenny
    Taniar, David
    [J]. 2010 24TH IEEE INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS (AINA), 2010, : 1150 - 1157
  • [2] Bioterrorism surveillance with real-time data warehousing
    Berndt, DJ
    Hevner, AR
    Studnicki, J
    [J]. INTELLIGENCE AND SECURITY INFORMATICS, PROCEEDINGS, 2003, 2665 : 322 - 335
  • [3] An architecture for real-time warehousing of scientific data
    Lawrence, R
    Kruger, A
    [J]. CSC '05: PROCEEDINGS OF THE 2005 INTERNATIONAL CONFERENCE ON SCIENTIFIC COMPUTING, 2005, : 151 - 156
  • [4] REAL-TIME INTERPOLATION OF STREAMING DATA
    Debski, Roman
    [J]. COMPUTER SCIENCE-AGH, 2020, 21 (04): : 515 - 534
  • [5] A continuous data integration methodology for supporting real-time data warehousing
    Santos, Ricardo Jorge
    Bernardino, Jorge
    [J]. ICEIS 2007: PROCEEDINGS OF THE NINTH INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS: DATABASES AND INFORMATION SYSTEMS INTEGRATION, 2007, : 589 - 595
  • [6] Real-Time Classification of Streaming Sensor Data
    Kasetty, Shashwati
    Stafford, Candice
    Walker, Gregory P.
    Wang, Xiaoyue
    Keogh, Eamonn
    [J]. 20TH IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, VOL 1, PROCEEDINGS, 2008, : 149 - +
  • [7] Real-time processing of streaming big data
    Safaei, Ali A.
    [J]. REAL-TIME SYSTEMS, 2017, 53 (01) : 1 - 44
  • [8] Real-time processing of streaming big data
    Ali A. Safaei
    [J]. Real-Time Systems, 2017, 53 : 1 - 44
  • [9] Real-time streaming of environmental field data
    Vivoni, ER
    Camilli, R
    [J]. COMPUTERS & GEOSCIENCES, 2003, 29 (04) : 457 - 468
  • [10] Performance Analysis of Not Only SQL Semi-Stream Join Using MongoDB for Real-Time Data Warehousing
    Mehmood, Erum
    Anees, Tayyaba
    [J]. IEEE ACCESS, 2019, 7 : 134215 - 134225