Evaluation of Data Enrichment Methods for Distributed Stream Processing Systems

被引:0
|
作者
Scheinert, Dominik [1 ]
Casares, Fabian [1 ]
Geldenhuys, Morgan K. [1 ]
Styp-Rekowski, Kevin [1 ]
Kao, Odej [1 ]
机构
[1] Tech Univ Berlin, Berlin, Germany
关键词
Distributed Stream Processing; Data Enrichment; Data Analysis; Resource Management; Cloud Computing;
D O I
10.1109/IC2E59103.2023.00030
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Stream processing has become a critical component in the architecture of modern applications. With the exponential growth of data generation from sources such as the Internet of Things, business intelligence, and telecommunications, real-time processing of unbounded data streams has become a necessity. DSP systems provide a solution to this challenge, offering high horizontal scalability, fault-tolerant execution, and the ability to process data streams from multiple sources in a single DSP job. Often enough though, data streams need to be enriched with extra information for correct processing, which introduces additional dependencies and potential bottlenecks. In this paper, we present an in-depth evaluation of data enrichment methods for DSP systems and identify the different use cases for stream processing in modern systems. Using a representative DSP system and conducting the evaluation in a realistic cloud environment, we found that outsourcing enrichment data to the DSP system can improve performance for specific use cases. However, this increased resource consumption highlights the need for stream processing solutions specifically designed for the performance-intensive workloads of cloud-based applications.
引用
收藏
页码:202 / 211
页数:10
相关论文
共 50 条
  • [1] Reliable stream data processing for elastic distributed stream processing systems
    Xiaohui Wei
    Yuan Zhuang
    Hongliang Li
    Zhiliang Liu
    Cluster Computing, 2020, 23 : 555 - 574
  • [2] Reliable stream data processing for elastic distributed stream processing systems
    Wei, Xiaohui
    Zhuang, Yuan
    Li, Hongliang
    Liu, Zhiliang
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2020, 23 (02): : 555 - 574
  • [3] Benchmarking Distributed Stream Data Processing Systems
    Karimov, Jeyhun
    Rabl, Tilmann
    Katsifodimos, Asterios
    Samarev, Roman
    Heiskanen, Henri
    Markl, Volker
    2018 IEEE 34TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2018, : 1507 - 1518
  • [4] Tracing Distributed Data Stream Processing Systems
    Zvara, Zoltan
    Szabo, Peter G. N.
    Hermann, Gabor
    Benczur, Andras
    2017 IEEE 2ND INTERNATIONAL WORKSHOPS ON FOUNDATIONS AND APPLICATIONS OF SELF* SYSTEMS (FAS*W), 2017, : 235 - 242
  • [5] Resource Estimation in Distributed Data Stream Processing Systems
    Fan, Minglu
    Liang, Yi
    Liu, Fei
    Yang, Mangmang
    Wang, Haihua
    PROCEEDINGS OF THE 2016 2ND WORKSHOP ON ADVANCED RESEARCH AND TECHNOLOGY IN INDUSTRY APPLICATIONS, 2016, 81 : 1824 - 1827
  • [6] Data-Trace Types for Distributed Stream Processing Systems
    Mamouras, Konstantinos
    Stanford, Caleb
    Alur, Rajeev
    Ives, Zachary G.
    Tannen, Val
    PROCEEDINGS OF THE 40TH ACM SIGPLAN CONFERENCE ON PROGRAMMING LANGUAGE DESIGN AND IMPLEMENTATION (PLDI '19), 2019, : 670 - 685
  • [7] DIsCO: DynamIc Data COmpression in Distributed Stream Processing Systems
    Zacheilas, Nikos
    Kalogeraki, Vana
    DISTRIBUTED APPLICATIONS AND INTEROPERABLE SYSTEMS, DAIS 2017, 2017, 10320 : 19 - 33
  • [8] Automatic Performance Tuning for Distributed Data Stream Processing Systems
    Herodotou, Herodotos
    Odysseos, Lambros
    Chen, Yuxing
    Lu, Jiaheng
    2022 IEEE 38TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2022), 2022, : 3194 - 3197
  • [9] An Evaluation of Data Stream Processing Systems for Data Driven Applications
    Samosir, Jonathan
    Indrawan-Santiago, Maria
    Haghighi, Pari Delir
    INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE 2016 (ICCS 2016), 2016, 80 : 439 - 449
  • [10] Evaluation and Development Perspectives of Stream Data Processing Systems
    Gorawski, Marcin
    Gorawska, Anna
    Pasterak, Krzysztof
    COMPUTER NETWORKS, CN 2013, 2013, 370 : 300 - 311