HBelt: Integrating an Incremental ETL Pipeline with a Big Data Store for Real-Time Analytics

被引:1
|
作者
Qu, Weiping [1 ]
Shankar, Sahana [1 ]
Ganza, Sandy [1 ]
Dessloch, Stefan [1 ]
机构
[1] Univ Kaiserslautern, Heterogeneous Informat Syst Grp, D-67663 Kaiserslautern, Germany
关键词
SYSTEM;
D O I
10.1007/978-3-319-23135-8_9
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper demonstrates a system called HBelt which tightly integrates a distributed, key-value data store HBase with an extended ETL engine Kettle. The objective is to provide HBase tables with real-time data freshness in an efficient manner. A distributed ETL engine is extended and integrated as an overlay of HBase. Meanwhile, we extend this ETL engine with the capability of processing incremental ETL flows in a pipelined fashion. Delta batches are defined by the MVCC component in HBase to flush the incremental ETL pipeline for multiple concurrent read requests. Experimental results show that high query throughput can be achieved in HBelt for real-time analytics.
引用
收藏
页码:123 / 137
页数:15
相关论文
共 50 条
  • [1] An incremental approach for real-time Big Data visual analytics
    Garcia, Ignacio
    Casado, Ruben
    Bouchachia, Abdelhamid
    2016 IEEE 4TH INTERNATIONAL CONFERENCE ON FUTURE INTERNET OF THINGS AND CLOUD WORKSHOPS (FICLOUDW), 2016, : 177 - 182
  • [2] Scalable Containerized Pipeline for Real-time Big Data Analytics
    Aurangzaib, Rana
    Iqbal, Waheed
    Abdullah, Muhammad
    Bukhari, Faisal
    Ullah, Faheem
    Erradi, Abdelkarim
    2022 IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING TECHNOLOGY AND SCIENCE (CLOUDCOM 2022), 2022, : 25 - 32
  • [3] Real-Time Data ETL Framework for Big Real-Time Data Analysis
    Li, Xiaofang
    Mao, Yingchi
    2015 IEEE INTERNATIONAL CONFERENCE ON INFORMATION AND AUTOMATION, 2015, : 1289 - 1294
  • [4] AScale: Big/Small Data ETL and Real-Time Data Freshness
    Martins, Pedro
    Abbasi, Maryam
    Furtado, Pedro
    BEYOND DATABASES, ARCHITECTURES AND STRUCTURES, BDAS 2016, 2016, 613 : 315 - 327
  • [5] Distributed real-time ETL architecture for unstructured big data
    Erum Mehmood
    Tayyaba Anees
    Knowledge and Information Systems, 2022, 64 : 3419 - 3445
  • [6] Distributed real-time ETL architecture for unstructured big data
    Mehmood, Erum
    Anees, Tayyaba
    KNOWLEDGE AND INFORMATION SYSTEMS, 2022, 64 (12) : 3419 - 3445
  • [7] Efficient incremental loading in ETL processing for real-time data integration
    Biswas, Neepa
    Sarkar, Anamitra
    Mondal, Kartick Chandra
    INNOVATIONS IN SYSTEMS AND SOFTWARE ENGINEERING, 2020, 16 (01) : 53 - 61
  • [8] Efficient incremental loading in ETL processing for real-time data integration
    Neepa Biswas
    Anamitra Sarkar
    Kartick Chandra Mondal
    Innovations in Systems and Software Engineering, 2020, 16 : 53 - 61
  • [9] Real-Time Snapshot Maintenance with Incremental ETL Pipelines in Data Warehouses
    Qu, Weiping
    Basavaraj, Vinanthi
    Shankar, Sahana
    Dessloch, Stefan
    BIG DATA ANALYTICS AND KNOWLEDGE DISCOVERY, 2015, 9263 : 217 - 228
  • [10] GRAPHONE: A Data Store for Real-time Analytics on Evolving Graphs
    Kumar, Pradeep
    Huang, H. Howie
    ACM TRANSACTIONS ON STORAGE, 2020, 15 (04)