PStream: A Popularity-Aware Differentiated Distributed Stream Processing System

被引:6
|
作者
Chen, Hanhua [1 ]
Zhang, Fan [1 ]
Jin, Hai [1 ]
机构
[1] Huazhong Univ Sci & Technol, Sch Comp Sci & Technol, Natl Engn Res Ctr Big Data Technol & Syst, Serv Comp Technol & Syst Lab,Cluster & Grid Comp, Wuhan 430074, Peoples R China
关键词
Parallel processing; Throughput; Real-time systems; Memory management; Distributed databases; Storms; Scalability; Distributed stream processing system; skewness; load balance; DATA PARALLELISM; POWER;
D O I
10.1109/TC.2020.3019689
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Real-world stream data with skewed distributions raises unique challenges to distributed stream processing systems. Existing stream workload partitioning schemes usually use a "one size fits all" design, which leverages either a shuffle grouping or a key grouping strategy for partitioning the stream workloads among multiple processing units, leading to notable problems of unsatisfied system throughput and processing latency. In this article, we show that the key grouping based schemes result in serious load imbalance and low computation efficiency in the presence of data skewness while the shuffle grouping schemes are not scalable in terms of memory space. We argue that the key to efficient stream scheduling is the popularity of the stream data. We propose PStream, a popularity-aware differentiated distributed stream processing system which assigns the hot keys using shuffle grouping while assigns rare ones using key grouping. PStream leverages a novel light-weighted probabilistic counting scheme for identifying the currently hot keys in dynamic real-time streams. The scheme is extremely efficient in computation and memory consumption, so that the predictor based on it can be well integrated into processing instances in the system. We further design an adaptive threshold configuration scheme, which can quickly adapt to the dynamical popularity changes in highly dynamical real-time streams. We implement PStream on top of Apache Storm and conduct comprehensive experiments using large-scale traces from real-world systems to evaluate the performance of this design. Results show that PStream achieves a 2.3x improvement in terms of processing throughput and reduces the processing latency by 64 percent compared to state-of-the-art designs.
引用
收藏
页码:1582 / 1597
页数:16
相关论文
共 50 条
  • [21] Popularity-aware collective keyword queries in road networks
    Zhao, Sen
    Cheng, Xiang
    Su, Sen
    Shuang, Kai
    GEOINFORMATICA, 2017, 21 (03) : 485 - 518
  • [22] Popularity-aware spatial keyword search on activity trajectories
    Zheng, Kai
    Zheng, Bolong
    Xu, Jiajie
    Liu, Guanfeng
    Liu, An
    Li, Zhixu
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2017, 20 (04): : 749 - 773
  • [23] Popularity-Aware Caching Increases the Capacity of Wireless Networks
    Qiu, Li
    Cao, Guohong
    IEEE TRANSACTIONS ON MOBILE COMPUTING, 2020, 19 (01) : 173 - 187
  • [24] Mobility and Popularity-Aware Coded Small-Cell Caching
    Ozfatura, Emre
    Gunduz, Deniz
    IEEE COMMUNICATIONS LETTERS, 2018, 22 (02) : 288 - 291
  • [25] Popularity-Aware Closeness Based Caching in NDN Edge Networks
    Amadeo, Marica
    Campolo, Claudia
    Ruggeri, Giuseppe
    Molinaro, Antonella
    SENSORS, 2022, 22 (09)
  • [26] PaRS: A Popularity-Aware Redundancy Scheme for In-Memory Stores
    Zhou, Panping
    Huang, Jianzhong
    Qin, Xiao
    Xie, Changsheng
    IEEE TRANSACTIONS ON COMPUTERS, 2019, 68 (04) : 556 - 569
  • [27] PopDCL: Popularity-aware Debiased Contrastive Loss for Collaborative Filtering
    Liu, Zhuang
    Li, Haoxuan
    Chen, Guanming
    Ouyang, Yuanxin
    Rong, Wenge
    Xiong, Zhang
    PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, : 1482 - 1492
  • [28] QoS aware dependable distributed stream processing
    Kalogeraki, Vana
    Gunopulos, Dimitrios
    Sandhu, Ravi
    Thuraisingham, Bhavani
    ISORC 2008: 11TH IEEE SYMPOSIUM ON OBJECT/COMPONENT/SERVICE-ORIENTED REAL-TIME DISTRIBUTED COMPUTING - PROCEEDINGS, 2008, : 69 - +
  • [29] Popularity-Aware Rate Allocation in Multi-View Video
    Fiandrotti, Attilio
    Chakareski, Jacob
    Frossard, Pascal
    VISUAL COMMUNICATIONS AND IMAGE PROCESSING 2010, 2010, 7744
  • [30] A Cost-Aware Operator Migration Approach for Distributed Stream Processing System
    Tan, Jiawei
    Tang, Zhuo
    Cai, Wentong
    Tan, Wen Jun
    Xiao, Xiong
    Zhang, Jiapeng
    Gao, Yi
    Li, Kenli
    IEEE TRANSACTIONS ON CLOUD COMPUTING, 2025, 13 (01) : 441 - 454