PStream: A Popularity-Aware Differentiated Distributed Stream Processing System

被引:6
|
作者
Chen, Hanhua [1 ]
Zhang, Fan [1 ]
Jin, Hai [1 ]
机构
[1] Huazhong Univ Sci & Technol, Sch Comp Sci & Technol, Natl Engn Res Ctr Big Data Technol & Syst, Serv Comp Technol & Syst Lab,Cluster & Grid Comp, Wuhan 430074, Peoples R China
关键词
Parallel processing; Throughput; Real-time systems; Memory management; Distributed databases; Storms; Scalability; Distributed stream processing system; skewness; load balance; DATA PARALLELISM; POWER;
D O I
10.1109/TC.2020.3019689
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Real-world stream data with skewed distributions raises unique challenges to distributed stream processing systems. Existing stream workload partitioning schemes usually use a "one size fits all" design, which leverages either a shuffle grouping or a key grouping strategy for partitioning the stream workloads among multiple processing units, leading to notable problems of unsatisfied system throughput and processing latency. In this article, we show that the key grouping based schemes result in serious load imbalance and low computation efficiency in the presence of data skewness while the shuffle grouping schemes are not scalable in terms of memory space. We argue that the key to efficient stream scheduling is the popularity of the stream data. We propose PStream, a popularity-aware differentiated distributed stream processing system which assigns the hot keys using shuffle grouping while assigns rare ones using key grouping. PStream leverages a novel light-weighted probabilistic counting scheme for identifying the currently hot keys in dynamic real-time streams. The scheme is extremely efficient in computation and memory consumption, so that the predictor based on it can be well integrated into processing instances in the system. We further design an adaptive threshold configuration scheme, which can quickly adapt to the dynamical popularity changes in highly dynamical real-time streams. We implement PStream on top of Apache Storm and conduct comprehensive experiments using large-scale traces from real-world systems to evaluate the performance of this design. Results show that PStream achieves a 2.3x improvement in terms of processing throughput and reduces the processing latency by 64 percent compared to state-of-the-art designs.
引用
收藏
页码:1582 / 1597
页数:16
相关论文
共 50 条
  • [31] Cooperative popularity-aware MPEG-4/MPEG-7 streaming proxy system
    Lin, CJ
    Ni, YH
    Suen, HP
    Chou, CF
    PROCEEDINGS OF THE NINTH INTERNATIONAL SYMPOSIUM ON CONSUMER ELECTRONICS 2005, 2005, : 47 - 52
  • [32] Popularity-Aware Caching for Vehicle Clusters With Federated Deep Reinforcement Learning
    Wang, Yuanyu
    Zheng, Ke
    Ye, Wenhui
    Tang, Yuliang
    IEEE COMMUNICATIONS LETTERS, 2023, 27 (06) : 1644 - 1648
  • [33] A Novel Predictive Approach to Content Popularity-Aware Edge Caching in VEC
    Zuo, YiYuan
    Xia, Yunni
    Yang, Ruilong
    Wang, Xu
    Zhong, Xingli
    Xia, Qing
    Sun, Xiaoning
    Feng, Jiafeng
    2024 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE SERVICES ENGINEERING, SSE 2024, 2024, : 1 - 8
  • [34] A popularity-aware and energy-efficient offloading mechanism in fog computing
    Chuang, Yung-Ting
    Hsiang, Chiu-Shun
    JOURNAL OF SUPERCOMPUTING, 2022, 78 (18): : 19435 - 19458
  • [35] A Popularity-Aware Semantic Overlay for Efficient Peer-to-Peer Search
    Lee, Choonhwa
    Choi, Junwan
    Kim, Eunsam
    ADVANCES IN ELECTRICAL AND COMPUTER ENGINEERING, 2014, 14 (04) : 105 - 108
  • [36] Video Quality and Popularity-aware Video Caching in Content Delivery Networks
    Sun, Yijun
    Guo, Zehua
    Dou, Songshi
    Xia, Yuanqing
    2021 IEEE INTERNATIONAL CONFERENCE ON WEB SERVICES, ICWS 2021, 2021, : 648 - 650
  • [37] Popularity-aware prefetch in P2P range caching
    Qiang Wang
    Khuzaima Daudjee
    M. Tamer Özsu
    Peer-to-Peer Networking and Applications, 2010, 3 : 145 - 160
  • [38] Popularity-Aware In-Network Caching for Edge Named Data Network
    Yin, Jiliang
    Jiang, Congfeng
    Mino, Hidetoshi
    Cerin, Christophe
    WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2021, 2021
  • [39] Intelligent popularity-aware content caching and retrieving in highway vehicular networks
    Quan, Wei
    Liu, Yana
    Jiang, Xiaoxiao
    Guan, Jianfeng
    EURASIP JOURNAL ON WIRELESS COMMUNICATIONS AND NETWORKING, 2016,
  • [40] A popularity-aware method for discovering server IP addresses related to websites
    Miguel Torres, Luis
    Magana, Eduardo
    Izal, Mikel
    Morato, Daniel
    2013 GLOBAL INFORMATION INFRASTRUCTURE SYMPOSIUM, 2013,