Rhino: Efficient Management of Very Large Distributed State for Stream Processing Engines

被引:26
|
作者
Del Monte, Bonaventura [1 ,2 ]
Zeuch, Steffen [1 ,2 ]
Rabl, Tilmann [3 ]
Markl, Volker [1 ,2 ]
机构
[1] Tech Univ Berlin, Berlin, Germany
[2] DFKI GmbH, Kaiserslautern, Germany
[3] Univ Potsdam, HPI, Potsdam, Germany
关键词
LATENCY;
D O I
10.1145/3318464.3389723
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Scale-out stream processing engines (SPEs) are powering large big data applications on high velocity data streams. Industrial setups require SPEs to sustain outages, varying data rates, and low-latency processing. SPEs need to transparently reconfigure stateful queries during runtime. However, state-of-the-art SPEs are not ready yet to handle on-the-fly reconfigurations of queries with terabytes of state due to three problems. These are network overhead for state migration, consistency, and overhead on data processing. In this paper, we propose Rhino, a library for efficient reconfigurations of running queries in the presence of very large distributed state. Rhino provides a handover protocol and a state migration protocol to consistently and efficiently migrate stream processing among servers. Overall, our evaluation shows that Rhino scales with state sizes of up to TBs, reconfigures a running query 15 times faster than the state-of-the-art, and reduces latency by three orders of magnitude upon a reconfiguration.
引用
收藏
页码:2471 / 2486
页数:16
相关论文
共 50 条
  • [21] Efficient State Management in Distributed Ledgers
    Karakostas, Dimitris
    Karayannidis, Nikos
    Kiayias, Aggelos
    FINANCIAL CRYPTOGRAPHY AND DATA SECURITY, FC 2021, PT II, 2021, 12675 : 319 - 338
  • [22] Toward Predictive Failure Management for Distributed Stream Processing Systems
    Gu, Xiaohui
    Papadimitriou, Spiros
    Yu, Philip S.
    Chang, Shu-Ping
    28TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS, VOLS 1 AND 2, PROCEEDINGS, 2008, : 825 - +
  • [23] Efficient access methods for very large distributed graph databases
    Luaces, David
    Viqueira, Jose R. R.
    Cotos, Jose M.
    Flores, Julian C.
    INFORMATION SCIENCES, 2021, 573 (573) : 65 - 81
  • [24] DART: Fast and Efficient Distributed Stream Processing Framework for Internet of Things
    Choi, Jang-Ho
    Park, Junyong
    Park, Hwin Dol
    Min, Ok-gee
    ETRI JOURNAL, 2017, 39 (02) : 202 - 212
  • [25] Efficient and coordinated checkpointing for reliable distributed data stream management
    Brettlecker, Gert
    Schuldt, Heiko
    Schek, Hans-Joerg
    ADVANCES IN DATABASES AND INFORMATION SYSTEMS, PROCEEDINGS, 2006, 4152 : 296 - 312
  • [26] Meces: Latency-efficient Rescaling via Prioritized State Migration for Stateful Distributed Stream Processing Systems
    Gu, Rong
    Yin, Han
    Zhong, Weichang
    Yuan, Chunfeng
    Huang, Yihua
    PROCEEDINGS OF THE 2022 USENIX ANNUAL TECHNICAL CONFERENCE, 2022, : 539 - 556
  • [27] Efficient Distributed Processing for Large Scale MIMO Detection
    Ouameur, Messaoud Ahmed
    Massicotte, Daniel
    2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2019,
  • [28] Efficient and Adaptive Stateful Replication for Stream Processing Engines in High-Availability Cluster
    Feng, Yi-Hsuan
    Huang, Nen-Fu
    Wu, Yen-Min
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2011, 22 (11) : 1788 - 1796
  • [29] A New Operator for Efficient Stream-Relation Join Processing in Data Streaming Engines
    Derakhshan, Roozbeh
    Sattar, Abdul
    Stantic, Bela
    PROCEEDINGS OF THE 22ND ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM'13), 2013, : 793 - 798
  • [30] A Stream Partitioning Approach to Processing Large Scale Distributed Graph Datasets
    Wang, Rui
    Chiu, Kenneth
    2013 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2013,