Evaluation of Data Enrichment Methods for Distributed Stream Processing Systems

被引:0
|
作者
Scheinert, Dominik [1 ]
Casares, Fabian [1 ]
Geldenhuys, Morgan K. [1 ]
Styp-Rekowski, Kevin [1 ]
Kao, Odej [1 ]
机构
[1] Tech Univ Berlin, Berlin, Germany
关键词
Distributed Stream Processing; Data Enrichment; Data Analysis; Resource Management; Cloud Computing;
D O I
10.1109/IC2E59103.2023.00030
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Stream processing has become a critical component in the architecture of modern applications. With the exponential growth of data generation from sources such as the Internet of Things, business intelligence, and telecommunications, real-time processing of unbounded data streams has become a necessity. DSP systems provide a solution to this challenge, offering high horizontal scalability, fault-tolerant execution, and the ability to process data streams from multiple sources in a single DSP job. Often enough though, data streams need to be enriched with extra information for correct processing, which introduces additional dependencies and potential bottlenecks. In this paper, we present an in-depth evaluation of data enrichment methods for DSP systems and identify the different use cases for stream processing in modern systems. Using a representative DSP system and conducting the evaluation in a realistic cloud environment, we found that outsourcing enrichment data to the DSP system can improve performance for specific use cases. However, this increased resource consumption highlights the need for stream processing solutions specifically designed for the performance-intensive workloads of cloud-based applications.
引用
收藏
页码:202 / 211
页数:10
相关论文
共 50 条
  • [31] ContTune: Continuous Tuning by Conservative Bayesian Optimization for Distributed Stream Data Processing Systems
    Lian, Jinqing
    Zhang, Xinyi
    Shao, Yingxia
    Pu, Zenglin
    Xiang, Qingfeng
    Li, Yawen
    Cui, Bin
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2023, 16 (13): : 4282 - 4295
  • [32] RIoTBench: An IoT benchmark for distributed stream processing systems
    Shukla, Anshu
    Chaturvedi, Shilpa
    Simmhan, Yogesh
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2017, 29 (21):
  • [33] Poster: Iterative Scheduling for Distributed Stream Processing Systems
    Eskandari, Leila
    Mair, Jason
    Huang, Zhiyi
    Eyers, David
    DEBS'18: PROCEEDINGS OF THE 12TH ACM INTERNATIONAL CONFERENCE ON DISTRIBUTED AND EVENT-BASED SYSTEMS, 2018, : 234 - 237
  • [34] Evaluation of Load Prediction Techniques for Distributed Stream Processing
    Gontarska, Kordian
    Geldenhuys, Morgan
    Scheinert, Dominik
    Wiesner, Philipp
    Polze, Andreas
    Thamsen, Lauritz
    2021 IEEE INTERNATIONAL CONFERENCE ON CLOUD ENGINEERING, IC2E 2021, 2021, : 91 - 98
  • [35] Minimizing Overheads of Checkpoints in Distributed Stream Processing Systems
    Akber, Syed Muhammad Abrar
    Chen, Hanhua
    Wang, Yonghui
    Jin, Hai
    2018 IEEE 7TH INTERNATIONAL CONFERENCE ON CLOUD NETWORKING (CLOUDNET), 2018,
  • [36] Deciding Backup Location Methods for Distributed Stream Processing System
    Iijima, Naoki
    Amemiya, Koichiro
    Ogawa, Jun
    Miyoshi, Hidenobu
    2020 3RD INTERNATIONAL CONFERENCE ON INFORMATION AND COMPUTER TECHNOLOGIES (ICICT 2020), 2020, : 266 - 270
  • [37] Prompt: Dynamic Data-Partitioning for Distributed Micro-batch Stream Processing Systems
    Abdelhamid, Ahmed S.
    Mahmood, Ahmed R.
    Daghistani, Anas
    Aref, Walid G.
    SIGMOD'20: PROCEEDINGS OF THE 2020 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2020, : 2455 - 2469
  • [38] Priority-based Resource Scheduling in Distributed Stream Processing Systems for Big Data Applications
    Bellavista, Paolo
    Corradi, Antonio
    Reale, Andrea
    Ticca, Nicola
    2014 IEEE/ACM 7TH INTERNATIONAL CONFERENCE ON UTILITY AND CLOUD COMPUTING (UCC), 2014, : 363 - 370
  • [39] Conceptual Survey on Data Stream Processing Systems
    Hesse, Guenter
    Lorenz, Martin
    2015 IEEE 21ST INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2015, : 797 - 802
  • [40] On the Cost of Acking in Data Stream Processing Systems
    Pagliari, Alessio
    Huet, Fabrice
    Urvoy-Keller, Guillaume
    2019 19TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID), 2019, : 331 - 340