Rhino: Efficient Management of Very Large Distributed State for Stream Processing Engines

被引:26
|
作者
Del Monte, Bonaventura [1 ,2 ]
Zeuch, Steffen [1 ,2 ]
Rabl, Tilmann [3 ]
Markl, Volker [1 ,2 ]
机构
[1] Tech Univ Berlin, Berlin, Germany
[2] DFKI GmbH, Kaiserslautern, Germany
[3] Univ Potsdam, HPI, Potsdam, Germany
关键词
LATENCY;
D O I
10.1145/3318464.3389723
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Scale-out stream processing engines (SPEs) are powering large big data applications on high velocity data streams. Industrial setups require SPEs to sustain outages, varying data rates, and low-latency processing. SPEs need to transparently reconfigure stateful queries during runtime. However, state-of-the-art SPEs are not ready yet to handle on-the-fly reconfigurations of queries with terabytes of state due to three problems. These are network overhead for state migration, consistency, and overhead on data processing. In this paper, we propose Rhino, a library for efficient reconfigurations of running queries in the presence of very large distributed state. Rhino provides a handover protocol and a state migration protocol to consistently and efficiently migrate stream processing among servers. Overall, our evaluation shows that Rhino scales with state sizes of up to TBs, reconfigures a running query 15 times faster than the state-of-the-art, and reduces latency by three orders of magnitude upon a reconfiguration.
引用
收藏
页码:2471 / 2486
页数:16
相关论文
共 50 条
  • [41] Efficient metadata management in large distributed storage systems
    Brandt, SA
    Miller, EL
    Long, DDE
    Xue, L
    20TH IEEE/11TH NASA GODDARD CONFERENCE ON MASS STORAGE AND TECHNOLOGIES (MSST 2003), PROCEEDINGS, 2003, : 290 - 298
  • [42] A probabilistic dynamic technique for the distributed generation of very large state spaces
    Knottenbelt, WJ
    Harrison, PG
    Mestern, MA
    Kritzinger, PS
    PERFORMANCE EVALUATION, 2000, 39 (1-4) : 127 - 148
  • [43] Efficient processing techniques for very large-scale graph structure
    1600, Institute of Electronics Information Communication Engineers (97):
  • [44] RDF Data Storage Techniques for Efficient SPARQL Query Processing using Distributed Computation Engines
    Hassan, Mahmudul
    Bansal, Srividya K.
    2018 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION (IRI), 2018, : 323 - 330
  • [45] Three-level caching for efficient query processing in large web search engines
    Long, Xiaohui
    Suel, Torsten
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2006, 9 (04): : 369 - 395
  • [46] Three-Level Caching for Efficient Query Processing in Large Web Search Engines
    Xiaohui Long
    Torsten Suel
    World Wide Web, 2006, 9 : 369 - 395
  • [47] Efficient Distributed Query Processing on Large Scale RDF Graph Data
    Wang X.
    Xu Q.
    Chai L.-L.
    Yang Y.-J.
    Chai Y.-P.
    Ruan Jian Xue Bao/Journal of Software, 2019, 30 (03): : 498 - 514
  • [48] Hierarchical graph embedding for efficient query processing in very large traffic, networks
    Kriegel, Hans-Peter
    Kroeger, Peer
    Renz, Matthias
    Schmidt, Tim
    SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT, PROCEEDINGS, 2008, 5069 : 150 - +
  • [49] Decentralized management of bi-modal network resources in a distributed stream processing platform
    Asaduzzaman, Shah
    Maheswaran, Muthucumaru
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2011, 71 (06) : 774 - 787
  • [50] Resource Management and Scheduling in Distributed Stream Processing Systems: A Taxonomy, Review, and Future Directions
    Liu, Xunyun
    Buyya, Rajkumar
    ACM COMPUTING SURVEYS, 2020, 53 (03)