Dynamic Resource Allocation Strategy for Flink Iterative Jobs

被引:0
|
作者
Yue X.-F. [1 ]
Shi L. [1 ]
Zhao Y.-H. [1 ]
Ji H.-X. [1 ]
Wang G.-R. [2 ]
机构
[1] School of Computer Science and Engineering, Northeastern University, Shenyang
[2] School of Computer Science and Technology, Beijing Institute of Technology, Beijing
来源
Ruan Jian Xue Bao/Journal of Software | 2022年 / 33卷 / 03期
关键词
Apache Flink; Iterative job; Resource allocation; Runtime limit; Runtime prediction;
D O I
10.13328/j.cnki.jos.006447
中图分类号
学科分类号
摘要
Apache Flink, an emerging distributed computing framework, supports the execution of large-scale iterative programs on the cluster, but its default static resource allocation mechanism makes it impossible to carry out reasonable resource allocation to make iterative jobs complete on time. In response to this problem, that users should be relied on to actively express performance constraints rather than passively retain resources. RABORP, a dynamic resource allocation strategy based on runtime prediction is proposed to develop and implement a dynamic resource allocation plan for Flink iterative jobs with clear runtime limits. The main idea is to predict the runtime of each iteration superstep, and then the initial allocation and dynamic adjustment of resources are performed at the time of the iterative job submission and the synchronization barrier between the supersteps according to the predicted results, to ensure that the minimum set of resources can be used to complete the iterative job within the runtime limit specified by the user. A variety of typical Flink iterative jobs were executed under the dataset to carry out relevant comparative experiments. Experimental results show that the established runtime prediction model can accurately predict the runtime of each superstep, and compared with the current state-of-the-art algorithms, the proposed dynamic resource allocation strategy used in single-job and multi-job scenarios has improved various performance indicators. © Copyright 2022, Institute of Software, the Chinese Academy of Sciences. All rights reserved.
引用
收藏
页码:985 / 1004
页数:19
相关论文
共 52 条
  • [1] Kang U, Tsourakakis CE, Faloutsos C., PEGASUS: A peta-scale graph mining system implementation and observations, Proc. of the 9th IEEE Int’l Conf. on Data Mining, pp. 229-238, (2009)
  • [2] Low Y, Bickson D, Gonzalez J, Guestrin C, Kyrola A, Hellerstein JM., Distributed GraphLab: A framework for machine learning and data mining in the cloud, Proc. of the VLDB Endowment, 5, 8, pp. 716-727, (2012)
  • [3] Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I., Spark: Cluster computing with working sets, Proc. of the 2nd USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 2010), (2010)
  • [4] Carbone P, Katsifodimos A, Ewen S, Markl V, Haridi S, Tzoumas K., Apache Flink: Stream and batch processing in a single engine, Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, 38, 4, pp. 28-38, (2015)
  • [5] Chen CLP, Zhang CY., Data-intensive applications, challenges, techniques and technologies: A survey on big data, Information Sciences, 275, pp. 314-347, (2014)
  • [6] Jyothi SA, Curino C, Menache I, Narayanamurthy SM, Tumanov A, Yaniv J, Mavlyutov R, Goiri I, Krishnan S, Rao S, Kulkarni J., Morpheus: Towards automated slos for enterprise clusters, Proc. of the 12th USENIX Symp. on Operating Systems Design and Implementation (OSDI), pp. 117-134, (2016)
  • [7] Tumanov A, Zhu T, Park JW, Kozuch MA, Harchol-Balter M, Ganger GR., TetriSched: Global rescheduling with adaptive plan- ahead in dynamic heterogeneous clusters, Proc. of the European Conf. on Computer Systems, pp. 1-16, (2016)
  • [8] Mishne G, Dalton J, Li Z, Sharma A, Lin J., Fast data in the era of big data: Twitter’s real-time related query suggestion architecture, Proc. of the ACM SIGMOD Int’l Conf. on Management of Data, pp. 1147-1158, (2013)
  • [9] Wolf J, Rajan D, Hildrum K, Khandekar R, Kumar V, Parekh S, Wu KL, Balmin A., FLEX: A slot allocation scheduling optimizer for MapReduce workloads, Proc. of the CM/IFIP/USENIX Int’l Conf. on Distributed Systems Platforms and Open Distributed Processing, pp. 1-20, (2010)
  • [10] Morton K, Friesen A, Balazinska M, Grossman D., Estimating the progress of MapReduce pipelines, Proc. of the 26th IEEE Int’l Conf. on Data Engineering (ICDE), pp. 681-684, (2010)