Multiple CNN-based Tasks Scheduling across Shared GPU Platform in Research and Development Scenarios<bold> </bold>

被引：3

作者：

Chen, Zhaoyun ^{[1
,2
]}

Luo, Lei ^{[1
]}

Quan, Wei ^{[1
]}

Shi, Yang ^{[1
,2
]}

Yu, Jie ^{[1
]}

Wen, Mei ^{[1
,2
]}

Zhang, Chunyuan ^{[1
,2
]}

机构：

[1] Natl Univ Def Technol, Coll Comp, Changsha, Hunan, Peoples R China

[2] Natl Key Lab Parallel & Distributed Proc, Changsha, Hunan, Peoples R China

来源：

IEEE 20TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS / IEEE 16TH INTERNATIONAL CONFERENCE ON SMART CITY / IEEE 4TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND SYSTEMS (HPCC/SMARTCITY/DSS) | 2018年

关键词：

CNN; AI Research and Development Scenario; Characterizing; Scheduling Exploration; GPU platform<bold>; </bold>;

D O I：

10.1109/HPCC/SmartCity/DSS.2018.00107

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In the scope of numerous AI enterprises and research institutes, a shared server or cluster, which are based on commodity GPU hardwares, need to process multiple diverse CNN-based tasks simultaneously which are submitted by different developers and researchers. Scheduling and processing multiple CNN-based tasks, including training and batch inference, are a significant challenge in these practical scenarios. Previous studies, which focus on either the latency of a single training task or the throughput of multiple inference tasks, cannot effectively exploit the limited system resources available for diverse CNN-based tasks. This paper, for the first time, focuses on this specific AI Research and Development scenario and conducts an series of explorations on characteration and scheduling for CNN-based tasks. In order to evaluate the qualities of processing and scheduling, we propose a series of comprehensive metrics, including user satisfaction and system efficiency. With the metrics, we characterize diverse CNN behaviors of a few typical CNN models under different application and system configurable factors. Then, a heuristic scheduling algorithm informed by our characterization is explored to better allocate computing resources for the upcoming tasks and to schedule them dynamically on the cluster or server. Compared with two baseline strategies, the results, which are evaluated on multi-GPU platforms, show that our proposed algorithm can improve system efficiency by up to 40% and decrease average response latency by around 38% for multiple CNN-based tasks.<bold> </bold>

引用

页码：578 / 585

页数：8