Evaluation of Data Enrichment Methods for Distributed Stream Processing Systems

被引:0
|
作者
Scheinert, Dominik [1 ]
Casares, Fabian [1 ]
Geldenhuys, Morgan K. [1 ]
Styp-Rekowski, Kevin [1 ]
Kao, Odej [1 ]
机构
[1] Tech Univ Berlin, Berlin, Germany
关键词
Distributed Stream Processing; Data Enrichment; Data Analysis; Resource Management; Cloud Computing;
D O I
10.1109/IC2E59103.2023.00030
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Stream processing has become a critical component in the architecture of modern applications. With the exponential growth of data generation from sources such as the Internet of Things, business intelligence, and telecommunications, real-time processing of unbounded data streams has become a necessity. DSP systems provide a solution to this challenge, offering high horizontal scalability, fault-tolerant execution, and the ability to process data streams from multiple sources in a single DSP job. Often enough though, data streams need to be enriched with extra information for correct processing, which introduces additional dependencies and potential bottlenecks. In this paper, we present an in-depth evaluation of data enrichment methods for DSP systems and identify the different use cases for stream processing in modern systems. Using a representative DSP system and conducting the evaluation in a realistic cloud environment, we found that outsourcing enrichment data to the DSP system can improve performance for specific use cases. However, this increased resource consumption highlights the need for stream processing solutions specifically designed for the performance-intensive workloads of cloud-based applications.
引用
收藏
页码:202 / 211
页数:10
相关论文
共 50 条
  • [41] Big Stream Processing Systems: An Experimental Evaluation
    Shahverdi, Elkhan
    Awad, Ahmed
    Sakr, Sherif
    2019 IEEE 35TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOPS (ICDEW 2019), 2019, : 53 - 60
  • [42] A Predictive Scheduling Framework for Fast and Distributed Stream Data Processing
    Li, Teng
    Tang, Jian
    Xu, Jielong
    PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2015, : 333 - 338
  • [43] Efficient Operator Placement for Distributed Data Stream Processing Applications
    Nardelli, Matteo
    Cardellini, Valeria
    Grassi, Vincenzo
    Lo Presti, Francesco
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2019, 30 (08) : 1753 - 1767
  • [44] Distributed processing in up stream data retrieval for distance education
    Tagami, Y
    Ito, H
    Kumamoto, A
    ADVANCED RESEARCH IN COMPUTERS AND COMMUNICATIONS IN EDUCATION, VOL 1: NEW HUMAN ABILITIES FOR THE NETWORKED SOCIETY, 1999, 55 : 460 - 467
  • [45] Online Scheduling for Shuffle Grouping in Distributed Stream Processing Systems
    Rivetti, Nicolo
    Anceaume, Emmanuelle
    Busnel, Yann
    Querzoni, Leonardo
    Sericola, Bruno
    MIDDLEWARE '16: PROCEEDINGS OF THE 17TH INTERNATIONAL MIDDLEWARE CONFERENCE, 2016,
  • [46] Algorithms for Windowed Aggregations and Joins on Distributed Stream Processing Systems
    Verwiebe, Juliane
    Grulich, Philipp M.
    Traub, Jonas
    Markl, Volker
    Datenbank-Spektrum, 2022, 22 (02) : 99 - 107
  • [47] Model-driven scheduling for distributed stream processing systems
    Shukla, Anshu
    Simmhan, Yogesh
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2018, 117 : 98 - 114
  • [48] Caladrius: A Performance Modelling Service for Distributed Stream Processing Systems
    Kalim, Faria
    Cooper, Thomas
    Wu, Huijun
    Li, Yao
    Wang, Ning
    Lu, Neng
    Fu, Maosong
    Qian, Xiaoyao
    Luo, Hao
    Cheng, Da
    Wang, Yaliang
    Dai, Fred
    Ghosh, Mainak
    Wang, Beinan
    2019 IEEE 35TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2019), 2019, : 1886 - 1897
  • [49] Modeling Distributed Stream Processing Systems under Heavy Workload
    Qureshi, Muhammad Mudassar
    Chen, Hanhua
    Jin, Hai
    2019 INTERNATIONAL CONFERENCE ON CYBERWORLDS (CW), 2019, : 93 - 100
  • [50] Intelligent Distributed Processing Methods for Big Data
    Jung, Jason J.
    Camacho, David
    Badica, Costin
    JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2015, 21 (06) : 754 - 756