Lever: Towards Low-Latency Batched Stream Processing by Pre-Scheduling

被引：5

作者：

Chen, Fei ^{[1
]}

Wu, Song ^{[1
]}

Jin, Hai ^{[1
]}

Yao, Yin ^{[1
]}

Liu, Zhiyi ^{[1
]}

Gu, Lin ^{[1
]}

Zhou, Yongluan ^{[2
]}

机构：

[1] Huazhong Univ Sci & Technol, SCTS CGCL, Wuhan, Hubei, Peoples R China

[2] Univ Copenhagen, Dept Comp Sci, Copenhagen, Denmark

来源：

PROCEEDINGS OF THE 2017 SYMPOSIUM ON CLOUD COMPUTING (SOCC '17) | 2017年

关键词：

stream processing; recurring jobs; straggler; scheduling;

D O I：

10.1145/3127479.3132687

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

With the vast involvement of streaming big data in many applications (e.g., stock market data, sensor data, social network data, etc.), quickly mining and analyzing such data is becoming more and more important. To provide fault tolerance and efficient stream processing at scale, recent stream processing frameworks have proposed to adapt batch processing systems, such as MapReduce and Spark, to handle streaming data by putting the streams into micro-batches and treating the workloads as a continuous series of small jobs [1]. The fundamental challenge of building a batched stream processing system is to minimize the processing latency of each micro-batch. In this paper, we focus on the straggler problem, where a subset of workers are straggling behind and significantly affecting the job completion time. The straggler problem is a well-known critical problem in parallel processing systems. In comparing to large batch processing, the straggler problems in micro-batch processing are more severe and harder to tackle. We argue that the problem of using the existing straggler mitigation solutions for micro-batch processing is that they detect (or predict) stragglers and re-schedule stragglers too late in the data handling pipeline. The re-scheduling actions are carried out during the task execution period, hence it would inevitably increase the processing time of the micro-batches. Furthermore, as the data have already been dispatched, re-scheduling would inherently incur expensive data relocation. Such overhead would become significant in micro-batch processing due to the short processing time of each micro-batch. We refer to this type of methods as post-scheduling techniques. To address the problem, we propose a new pre-scheduling framework, called Lever, which predicts stragglers and makes timely scheduling decisions to minimize the processing latency. As shown in Figure 1, Lever periodically collects and analyzes the historical job profiles of the recurring micro-batch jobs. Based on such information, Lever pre-schedules the data through three main steps, i.e. identify potential stragglers, evaluate node capacity and choose suitable helpers. More importantly, Lever makes the re-scheduling decisions before the batching module dispatches the data. As the scheduling is done while the data are being batched, it would not increase the processing time of the micro-batch. [GRAPHICAL ABSTRACT] We implemented Lever in Spark Streaming, which has been contributed to the open source community as an extension of Apache Spark Streaming. To the best of our knowledge, this is the first work specifically addressing the straggler problem in continuous micro-batch processing. We conduct various experiments to validate the effectiveness of Lever. The experimental results demonstrate that Lever reduces job completion time by 30.72% to 42.19% and outperforms traditional techniques significantly.

引用

页码：643 / 643

页数：1

共 50 条

[41] An efficient approach for low latency processing in stream data
Bhatt, Nirav
Thakkar, Amit
[J]. PEERJ COMPUTER SCIENCE, 2021, 7 : 1 - 19
[42] Cost-aware & Fault-tolerant Geo-distributed Edge Computing for Low-latency Stream Processing
Xu, Jinlai
Palanisamy, Balaji
[J]. 2021 IEEE 7TH INTERNATIONAL CONFERENCE ON COLLABORATION AND INTERNET COMPUTING (CIC 2021), 2021, : 117 - 124
[43] SplitJoin: A Scalable, Low-latency Stream Join Architecture with Adjustable Ordering Precision
Najafi, Mohammadreza
Sadoghi, Mohammad
Jacobsen, Hans-Arno
[J]. PROCEEDINGS OF USENIX ATC '16: 2016 USENIX ANNUAL TECHNICAL CONFERENCE, 2016, : 493 - 505
[44] Towards Emergent Security in Low-Latency Smart Grids with Distributed Control
Stuebs, Marius
[J]. 2018 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS, CONTROL, AND COMPUTING TECHNOLOGIES FOR SMART GRIDS (SMARTGRIDCOMM), 2018,
[45] Towards low-latency model-oriented distributed systems management
Diaz, Ivan
Tourino, Juan
Doallo, Ramon
[J]. MANAGING NEXT GENERATION NETWORKS AND SERVICES, PROCEEDINGS, 2007, 4773 : 41 - +
[46] Low-latency Image Processing for Vision-based Navigation Systems
Cizek, Petr
Faigl, Jan
Masri, Diar
[J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2016, : 781 - 786
[47] Cloud-Edge Coordinated Processing: Low-Latency Multicasting Transmission
He, Shiwen
Ren, Ju
Wang, Jiaheng
Huang, Yongming
Zhang, Yaoxue
Zhuang, Weihua
Shen, Sherman
[J]. IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 2019, 37 (05) : 1144 - 1158
[48] A Scalable Architecture for Low-Latency Market-Data Processing on FPGA
Tang, Qiu
Su, Majing
Jiang, Lei
Yang, Jiajia
Bai, Xu
[J]. 2016 IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATION (ISCC), 2016, : 597 - 603
[49] Topology Management and TSCH Scheduling for Low-Latency Convergecast in In-Vehicle WSNs
Tavakoli, Rasool
Nabi, Majid
Basten, Twan
Goossens, Kees
[J]. IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2019, 15 (02) : 1082 - 1093
[50] Edge Learning for Low-Latency Video Analytics: Query Scheduling and Resource Allocation
Lin, Jie
Yang, Peng
Wu, Wen
Zhang, Ning
Han, Tao
Yu, Li
[J]. 2021 IEEE 18TH INTERNATIONAL CONFERENCE ON MOBILE AD HOC AND SMART SYSTEMS (MASS 2021), 2021, : 252 - 259

← 1 2 3 4 5 →