Lever: Towards Low-Latency Batched Stream Processing by Pre-Scheduling

被引:5
|
作者
Chen, Fei [1 ]
Wu, Song [1 ]
Jin, Hai [1 ]
Yao, Yin [1 ]
Liu, Zhiyi [1 ]
Gu, Lin [1 ]
Zhou, Yongluan [2 ]
机构
[1] Huazhong Univ Sci & Technol, SCTS CGCL, Wuhan, Hubei, Peoples R China
[2] Univ Copenhagen, Dept Comp Sci, Copenhagen, Denmark
关键词
stream processing; recurring jobs; straggler; scheduling;
D O I
10.1145/3127479.3132687
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the vast involvement of streaming big data in many applications (e.g., stock market data, sensor data, social network data, etc.), quickly mining and analyzing such data is becoming more and more important. To provide fault tolerance and efficient stream processing at scale, recent stream processing frameworks have proposed to adapt batch processing systems, such as MapReduce and Spark, to handle streaming data by putting the streams into micro-batches and treating the workloads as a continuous series of small jobs [1]. The fundamental challenge of building a batched stream processing system is to minimize the processing latency of each micro-batch. In this paper, we focus on the straggler problem, where a subset of workers are straggling behind and significantly affecting the job completion time. The straggler problem is a well-known critical problem in parallel processing systems. In comparing to large batch processing, the straggler problems in micro-batch processing are more severe and harder to tackle. We argue that the problem of using the existing straggler mitigation solutions for micro-batch processing is that they detect (or predict) stragglers and re-schedule stragglers too late in the data handling pipeline. The re-scheduling actions are carried out during the task execution period, hence it would inevitably increase the processing time of the micro-batches. Furthermore, as the data have already been dispatched, re-scheduling would inherently incur expensive data relocation. Such overhead would become significant in micro-batch processing due to the short processing time of each micro-batch. We refer to this type of methods as post-scheduling techniques. To address the problem, we propose a new pre-scheduling framework, called Lever, which predicts stragglers and makes timely scheduling decisions to minimize the processing latency. As shown in Figure 1, Lever periodically collects and analyzes the historical job profiles of the recurring micro-batch jobs. Based on such information, Lever pre-schedules the data through three main steps, i.e. identify potential stragglers, evaluate node capacity and choose suitable helpers. More importantly, Lever makes the re-scheduling decisions before the batching module dispatches the data. As the scheduling is done while the data are being batched, it would not increase the processing time of the micro-batch. [GRAPHICAL ABSTRACT] We implemented Lever in Spark Streaming, which has been contributed to the open source community as an extension of Apache Spark Streaming. To the best of our knowledge, this is the first work specifically addressing the straggler problem in continuous micro-batch processing. We conduct various experiments to validate the effectiveness of Lever. The experimental results demonstrate that Lever reduces job completion time by 30.72% to 42.19% and outperforms traditional techniques significantly.
引用
收藏
页码:643 / 643
页数:1
相关论文
共 50 条
  • [41] An efficient approach for low latency processing in stream data
    Bhatt, Nirav
    Thakkar, Amit
    [J]. PEERJ COMPUTER SCIENCE, 2021, 7 : 1 - 19
  • [42] Cost-aware & Fault-tolerant Geo-distributed Edge Computing for Low-latency Stream Processing
    Xu, Jinlai
    Palanisamy, Balaji
    [J]. 2021 IEEE 7TH INTERNATIONAL CONFERENCE ON COLLABORATION AND INTERNET COMPUTING (CIC 2021), 2021, : 117 - 124
  • [43] SplitJoin: A Scalable, Low-latency Stream Join Architecture with Adjustable Ordering Precision
    Najafi, Mohammadreza
    Sadoghi, Mohammad
    Jacobsen, Hans-Arno
    [J]. PROCEEDINGS OF USENIX ATC '16: 2016 USENIX ANNUAL TECHNICAL CONFERENCE, 2016, : 493 - 505
  • [44] Towards Emergent Security in Low-Latency Smart Grids with Distributed Control
    Stuebs, Marius
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS, CONTROL, AND COMPUTING TECHNOLOGIES FOR SMART GRIDS (SMARTGRIDCOMM), 2018,
  • [45] Towards low-latency model-oriented distributed systems management
    Diaz, Ivan
    Tourino, Juan
    Doallo, Ramon
    [J]. MANAGING NEXT GENERATION NETWORKS AND SERVICES, PROCEEDINGS, 2007, 4773 : 41 - +
  • [46] Low-latency Image Processing for Vision-based Navigation Systems
    Cizek, Petr
    Faigl, Jan
    Masri, Diar
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2016, : 781 - 786
  • [47] Cloud-Edge Coordinated Processing: Low-Latency Multicasting Transmission
    He, Shiwen
    Ren, Ju
    Wang, Jiaheng
    Huang, Yongming
    Zhang, Yaoxue
    Zhuang, Weihua
    Shen, Sherman
    [J]. IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 2019, 37 (05) : 1144 - 1158
  • [48] A Scalable Architecture for Low-Latency Market-Data Processing on FPGA
    Tang, Qiu
    Su, Majing
    Jiang, Lei
    Yang, Jiajia
    Bai, Xu
    [J]. 2016 IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATION (ISCC), 2016, : 597 - 603
  • [49] Topology Management and TSCH Scheduling for Low-Latency Convergecast in In-Vehicle WSNs
    Tavakoli, Rasool
    Nabi, Majid
    Basten, Twan
    Goossens, Kees
    [J]. IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2019, 15 (02) : 1082 - 1093
  • [50] Edge Learning for Low-Latency Video Analytics: Query Scheduling and Resource Allocation
    Lin, Jie
    Yang, Peng
    Wu, Wen
    Zhang, Ning
    Han, Tao
    Yu, Li
    [J]. 2021 IEEE 18TH INTERNATIONAL CONFERENCE ON MOBILE AD HOC AND SMART SYSTEMS (MASS 2021), 2021, : 252 - 259