Rhino: Efficient Management of Very Large Distributed State for Stream Processing Engines

被引:26
|
作者
Del Monte, Bonaventura [1 ,2 ]
Zeuch, Steffen [1 ,2 ]
Rabl, Tilmann [3 ]
Markl, Volker [1 ,2 ]
机构
[1] Tech Univ Berlin, Berlin, Germany
[2] DFKI GmbH, Kaiserslautern, Germany
[3] Univ Potsdam, HPI, Potsdam, Germany
关键词
LATENCY;
D O I
10.1145/3318464.3389723
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Scale-out stream processing engines (SPEs) are powering large big data applications on high velocity data streams. Industrial setups require SPEs to sustain outages, varying data rates, and low-latency processing. SPEs need to transparently reconfigure stateful queries during runtime. However, state-of-the-art SPEs are not ready yet to handle on-the-fly reconfigurations of queries with terabytes of state due to three problems. These are network overhead for state migration, consistency, and overhead on data processing. In this paper, we propose Rhino, a library for efficient reconfigurations of running queries in the presence of very large distributed state. Rhino provides a handover protocol and a state migration protocol to consistently and efficiently migrate stream processing among servers. Overall, our evaluation shows that Rhino scales with state sizes of up to TBs, reconfigures a running query 15 times faster than the state-of-the-art, and reduces latency by three orders of magnitude upon a reconfiguration.
引用
收藏
页码:2471 / 2486
页数:16
相关论文
共 50 条
  • [1] FlowKV: A Semantic-Aware Store for Large-Scale State Management of Stream Processing Engines
    Lee, Gyewon
    Maeng, Jaewoo
    Park, Jinsol
    Seo, Jangho
    Cho, Haeyoon
    Yang, Youngseok
    Um, Taegeon
    Lee, Jongsung
    Lee, Jae W.
    Chun, Byung-Gon
    PROCEEDINGS OF THE EIGHTEENTH EUROPEAN CONFERENCE ON COMPUTER SYSTEMS, EUROSYS 2023, 2023, : 768 - 783
  • [2] How to Measure Scalability of Distributed Stream Processing Engines?
    Henning, Soeren
    Hasselbring, Wilhelm
    COMPANION OF THE ACM/SPEC INTERNATIONAL CONFERENCE ON PERFORMANCE ENGINEERING, ICPE 2021, 2021, : 85 - 88
  • [3] A Backpressure Mitigation Scheme in Distributed Stream Processing Engines
    Hanif, Muhammad
    Yoon, Hyeongdeok
    Lee, Choonhwa
    2020 34TH INTERNATIONAL CONFERENCE ON INFORMATION NETWORKING (ICOIN 2020), 2020, : 713 - 716
  • [4] Benchmarking Tool for Modern Distributed Stream Processing Engines
    Hanif, Muhammad
    Yoon, Hyeongdeok
    Lee, Choonhwa
    33RD INTERNATIONAL CONFERENCE ON INFORMATION NETWORKING (ICOIN 2019), 2019, : 393 - 395
  • [5] Online Nonstop Task Management for Storm-Based Distributed Stream Processing Engines
    Zhang, Zhou
    Jin, Pei-Quan
    Xie, Xi-Ke
    Wang, Xiao-Liang
    Liu, Rui-Cheng
    Wan, Shou-Hong
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2024, 39 (01) : 116 - 138
  • [6] State Management in Apache Flink® Consistent Stateful Distributed Stream Processing
    Carbone, Paris
    Ewen, Stephan
    Fora, Gyula
    Haridi, Seif
    Richter, Stefan
    Tzoumas, Kostas
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2017, 10 (12): : 1718 - 1729
  • [7] Efficient Coflow Transmission for Distributed Stream Processing
    Li, Wenxin
    Yuan, Xu
    Qu, Wenyu
    Qi, Heng
    Zhou, Xiaobo
    Chen, Sheng
    Xu, Renhai
    IEEE INFOCOM 2020 - IEEE CONFERENCE ON COMPUTER COMMUNICATIONS, 2020, : 1319 - 1328
  • [8] SLA-Based Adaptation Schemes in Distributed Stream Processing Engines
    Hanif, Muhammad
    Kim, Eunsam
    Helal, Sumi
    Lee, Choonhwa
    APPLIED SCIENCES-BASEL, 2019, 9 (06):
  • [9] Theodolite: Scalability Benchmarking of Distributed Stream Processing Engines in Microservice Architectures
    Henning, Soeren
    Hasselbring, Wilhelm
    BIG DATA RESEARCH, 2021, 25
  • [10] Query-Centric Failure Recovery for Distributed Stream Processing Engines
    Su, Li
    Zhou, Yongluan
    2018 IEEE 34TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2018, : 1276 - 1279