Algorithm of Scheduling for Data-intensive Computing Operations onto GPU Cluster

被引：0

作者：

Tang X.-C. ^{[1
]}

Zhu Z.-Y. ^{[1
]}

Mao A.-Q. ^{[1
]}

Fu Y. ^{[1
]}

Li Z.-H. ^{[1
]}

机构：

[1] School of Computer Science, Northwestern Polytechnical University, Xi’an

来源：

Ruan Jian Xue Bao/Journal of Software | 2022年 / 33卷 / 12期

关键词：

data intensive; data localization; fairness; GPU; minimum cost;

D O I：

10.13328/j.cnki.jos.006362

中图分类号：

学科分类号：

摘要：

Data-intensive tasks include a large number of tasks. Using GPU devices to improve the performance of tasks is the main method currently. However, in the case of solving the fair sharing of GPU resources between data-intensive tasks and reducing the cost of data network transmission, the existing research methods do not comprehensively consider the contradiction between resource fairness and data transmission costs. The study analyzes the characteristics of GPU cluster resource scheduling, and proposes an algorithm based on the minimum cost and the maximum number of tasks in GPU cluster resource scheduling. The method can solve the contradiction between the fair allocation of GPU resources and the high cost of data transmission. The scheduling process is divided into two stages. In the first stage, each job gives its own optimal plan according to the data transmission costs, and in the second stage, the resource allocator merges the plan of each job. Firstly, the study gives the overall structure of the framework, and the source allocator works globally after each job giving its own optimal plan. Secondly, the network bandwidth estimation strategy and the method of computing the data transmission cost of the task are given. Thirdly, the basic algorithm for the fair allocation of resources based on the number of GPUs is given. Fourthly, the scheduling algorithm with the smallest cost and the largest number of tasks is proposed, which describing the implementation strategies of resource non-grabbing, robbing and resource fairness strategies. Finally, six data-intensive computing tasks are designed, and the algorithm proposed in the study is tested, and the experiments verifies that the scheduling algorithm can achieve about 90% of resource fairness, while also ensuring that the parallel operation time of jobs is minimized. © 2022 Chinese Academy of Sciences. All rights reserved.

引用

页码：4429 / 4451

页数：22

共 31 条

[1] Kindratenko VV, Enos JJ, Shi GC, Et al., GPU clusters for high-performance computing, Proc. of the 2009 IEEE Int’l Conf. on Cluster Computing and Workshops, pp. 1-8, (2009)
[2] Merrill D, Garland M, Grimshaw A., Scalable GPU graph traversal, ACM Sigplan Notices, 47, 8, (2012)
[3] Zhao B, Zhong J, He B, Et al., GPU-accelerated cloud computing for data-intensive applications, Proc. of the Cloud Computing for Data-Intensive Applications, pp. 105-129, (2014)
[4] Che S, Boyer M, Meng J, Et al., Rodinia: A benchmark suite for heterogeneous computing, Proc. of the 2009 IEEE Int’l Symp. on Workload Characterization (IISWC), pp. 44-54, (2009)
[5] Hindman B, Konwinski A, Zaharia M, Et al., Mesos: A platform for fine-grained resource sharing in the data center, Proc. of the 8th USENIX Symp. on Networked Systems Design and Implementation, pp. 295-308, (2011)
[6] Hadoop capacity scheduler, (2021)
[7] Acuna P., Deploying Rails with Docker, Kubernetes and ECS, pp. 27-68, (2016)
[8] Schwarzkopf M, Konwinski A, Abd-El-Malek M, Et al., Omega: Flexible, scalable schedulers for large compute clusters, Proc. of the 8th ACM European Conf. on Computer Systems, pp. 351-364, (2013)
[9] Verma A, Pedrosa L, Korupolu M, Et al., Large-scale cluster management at Google with Borg, Proc. of the 10th European Conf. on Computer Systems, pp. 1-17, (2015)
[10] Karanasos K, Rao S, Curino C, Et al., Mercury: Hybrid centralized and distributed scheduling in large shared clusters, Proc. of the 18th USENIX Annual Technical Conf, pp. 485-497, (2015)

← 1 2 3 4 →