HBelt: Integrating an Incremental ETL Pipeline with a Big Data Store for Real-Time Analytics

被引:1
|
作者
Qu, Weiping [1 ]
Shankar, Sahana [1 ]
Ganza, Sandy [1 ]
Dessloch, Stefan [1 ]
机构
[1] Univ Kaiserslautern, Heterogeneous Informat Syst Grp, D-67663 Kaiserslautern, Germany
关键词
SYSTEM;
D O I
10.1007/978-3-319-23135-8_9
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper demonstrates a system called HBelt which tightly integrates a distributed, key-value data store HBase with an extended ETL engine Kettle. The objective is to provide HBase tables with real-time data freshness in an efficient manner. A distributed ETL engine is extended and integrated as an overlay of HBase. Meanwhile, we extend this ETL engine with the capability of processing incremental ETL flows in a pipelined fashion. Delta batches are defined by the MVCC component in HBase to flush the incremental ETL pipeline for multiple concurrent read requests. Experimental results show that high query throughput can be achieved in HBelt for real-time analytics.
引用
收藏
页码:123 / 137
页数:15
相关论文
共 50 条
  • [31] Real-time QoS Monitoring for Big Data Analytics in Mobile Environment: an Overview
    Xiao, Fang
    Wainaina, Paul
    2016 INTERNATIONAL CONGRESS ON COMPUTATION ALGORITHMS IN ENGINEERING (ICCAE 2016), 2016, : 26 - 30
  • [32] Using Big Data and Real-Time Analytics to Support Smart City Initiatives
    Souza, Arthur
    Figueredo, Mickael
    Cacho, Nelio
    Araujo, Daniel
    Prolo, Carlos A.
    IFAC PAPERSONLINE, 2016, 49 (30): : 257 - 262
  • [33] Real-time big data analytics for hard disk drive predictive maintenance
    Su, Chuan-Jun
    Huang, Shi-Feng
    COMPUTERS & ELECTRICAL ENGINEERING, 2018, 71 : 93 - 101
  • [34] Towards Real-Time Road Traffiic Analytics using Telco Big Data
    Costa, Constantinos
    Chatzimilioudis, Georgios
    Zeinalipour-Yazti, Demetrios
    Mokbel, Mohamed F.
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL WORKSHOP ON REAL-TIME BUSINESS INTELLIGENCE AND ANALYTICS, 2017,
  • [35] The growing role of integrated and insightful big and real-time data analytics platforms
    Ranganathan, Indrakumari
    Thangamuthu, Poongodi
    Palanimuthu, Suresh
    Balusamy, Balamurugan
    DIGITAL TWIN PARADIGM FOR SMARTER SYSTEMS AND ENVIRONMENTS: THE INDUSTRY USE CASES, 2020, 117 : 165 - 186
  • [36] GPGPU for Real-Time Data Analytics
    He, Bingsheng
    Huynh Phung Huynh
    Mong, Rick Goh Siow
    PROCEEDINGS OF THE 2012 IEEE 18TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS 2012), 2012, : 945 - +
  • [37] Performance Evaluation for Real-Time Messaging system in Big Data Pipeline Architecture
    Aung, Thandar
    Min, Hla Yin
    Maw, Aung Htein
    2018 INTERNATIONAL CONFERENCE ON CYBER-ENABLED DISTRIBUTED COMPUTING AND KNOWLEDGE DISCOVERY (CYBERC 2018), 2018, : 198 - 204
  • [38] Data Systems Fault Coping for Real-time Big Data Analytics Required Architectural Crucibles
    Cohen, Stephen
    Money, William
    PROCEEDINGS OF THE 50TH ANNUAL HAWAII INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES, 2017, : 1023 - 1032
  • [39] Big Data Real-Time Clickstream Data Ingestion Paradigm for E-Commerce Analytics
    Pal, Gautam
    Li, Gangmin
    Atkinson, Katie
    2018 4TH INTERNATIONAL CONFERENCE FOR CONVERGENCE IN TECHNOLOGY (I2CT), 2018,
  • [40] Mapping the Big Data Landscape: Technologies, Platforms and Paradigms for Real-Time Analytics of Data Streams
    Dubuc, Timothee
    Stahl, Frederic
    Roesch, Etienne B.
    IEEE ACCESS, 2021, 9 : 15351 - 15374