PStream: A Popularity-Aware Differentiated Distributed Stream Processing System

被引:6
|
作者
Chen, Hanhua [1 ]
Zhang, Fan [1 ]
Jin, Hai [1 ]
机构
[1] Huazhong Univ Sci & Technol, Sch Comp Sci & Technol, Natl Engn Res Ctr Big Data Technol & Syst, Serv Comp Technol & Syst Lab,Cluster & Grid Comp, Wuhan 430074, Peoples R China
关键词
Parallel processing; Throughput; Real-time systems; Memory management; Distributed databases; Storms; Scalability; Distributed stream processing system; skewness; load balance; DATA PARALLELISM; POWER;
D O I
10.1109/TC.2020.3019689
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Real-world stream data with skewed distributions raises unique challenges to distributed stream processing systems. Existing stream workload partitioning schemes usually use a "one size fits all" design, which leverages either a shuffle grouping or a key grouping strategy for partitioning the stream workloads among multiple processing units, leading to notable problems of unsatisfied system throughput and processing latency. In this article, we show that the key grouping based schemes result in serious load imbalance and low computation efficiency in the presence of data skewness while the shuffle grouping schemes are not scalable in terms of memory space. We argue that the key to efficient stream scheduling is the popularity of the stream data. We propose PStream, a popularity-aware differentiated distributed stream processing system which assigns the hot keys using shuffle grouping while assigns rare ones using key grouping. PStream leverages a novel light-weighted probabilistic counting scheme for identifying the currently hot keys in dynamic real-time streams. The scheme is extremely efficient in computation and memory consumption, so that the predictor based on it can be well integrated into processing instances in the system. We further design an adaptive threshold configuration scheme, which can quickly adapt to the dynamical popularity changes in highly dynamical real-time streams. We implement PStream on top of Apache Storm and conduct comprehensive experiments using large-scale traces from real-world systems to evaluate the performance of this design. Results show that PStream achieves a 2.3x improvement in terms of processing throughput and reduces the processing latency by 64 percent compared to state-of-the-art designs.
引用
收藏
页码:1582 / 1597
页数:16
相关论文
共 50 条
  • [1] Popularity-aware Differentiated Distributed Stream Processing on Skewed Streams
    Chen, Hanhua
    Hang, Fan
    Tin, Hai
    2017 IEEE 25TH INTERNATIONAL CONFERENCE ON NETWORK PROTOCOLS (ICNP), 2017,
  • [2] Popularity-Aware Content Caching for Distributed Wireless Helper Nodes
    Khan, Furqan H.
    Khan, Zeashan
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2017, 42 (08) : 3375 - 3389
  • [3] Popularity-Aware Content Caching for Distributed Wireless Helper Nodes
    Furqan H. Khan
    Zeashan Khan
    Arabian Journal for Science and Engineering, 2017, 42 : 3375 - 3389
  • [4] PopDCN: Popularity-Aware Dynamic Clustering Scheme for Distributed Caching in ICN
    Yoshida, Mikiya
    Ito, Yusuke
    Sato, Yurino
    Koga, Hiroyuki
    IEICE TRANSACTIONS ON COMMUNICATIONS, 2024, E107B (05) : 398 - 407
  • [5] Popularity-aware Distributionally Robust Optimization for Recommendation System
    Zhao, Jujia
    Wang, Wenjie
    Lin, Xinyu
    Qu, Leigang
    Zhang, Jizhi
    Chua, Tat-Seng
    PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, : 4967 - 4973
  • [6] Popularity-Aware Alignment and Contrast for Mitigating Popularity Bias
    Cai, Miaomiao
    Chen, Lei
    Wang, Yifan
    Bai, Haoyue
    Sun, Peijie
    Wu, Le
    Zhang, Min
    Wang, Meng
    PROCEEDINGS OF THE 30TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2024, 2024, : 187 - 198
  • [7] PopRing: A Popularity-aware Replica Placement for Distributed Key-Value Store
    Cavalcante, Denis M.
    Farias, Victor A.
    Sousa, Flavio R. C.
    Paula, Manoel Rui P.
    Machado, Javam C.
    Souza, Neuman
    CLOSER: PROCEEDINGS OF THE 8TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND SERVICES SCIENCE, 2018, : 440 - 447
  • [8] Performance Evaluation of Popularity-Aware Dynamic Clustering Scheme for Distributed Caching in ICN
    Yoshida, Mikiya
    Ito, Yusuke
    Sato, Yurino
    Koga, Hiroyuki
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 185 - 190
  • [9] Popularity-aware cache replacement in streaming environments
    Yan, HJ
    Lowenthal, DK
    PARALLEL AND DISTRIBUTED COMPUTING SYSTEMS, PROCEEDINGS, 2003, : 303 - 308
  • [10] Popularity-aware sequential recommendation with user desire
    Wu, Jiajin
    Yang, Bo
    Mao, Runze
    Li, Qing
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 237