Transparent I/O-Aware GPU Virtualization for Efficient Resource Consolidation

被引:2
|
作者
Gonzalez, Nelson Mimura [1 ]
Elengikal, Tonia [1 ]
机构
[1] IBM Thomas J Watson Res Ctr, 1101 Kitchawan Rd, Yorktown Hts, NY 10598 USA
关键词
GPU; virtualization; consolidation; disaggregalion; doud; HPC; HFGPU; OmniGPU; HFCUDA;
D O I
10.1109/IPDPS49936.2021.00022
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Graphics processing units (GPUs) are widely used in high performance computing (IIPC) and cloud computing to accelerate workloads. Virtualization provides flexible access to resources while improving utilization and throughput. This is essential to resource disaggregation, which allows ubiquitous access to remote resources among nodes. However, remote GPU virtualization at scale suffers from severe performance degradation due to inter-node communication and resource consolidation overhead, especially for data-intensive workloads. We propose HFGPU, a CPU virtualization solution transparent to application code based on application programming interface (API) remoting. We define a virtual device manager that allows remote GPUs to be seen, managed, and used as though they were local. To perform at scale we combine multi-adapter InfiniBand networking with a novel distributed UO forwarding mechanism that eliminates consolidation bottlenecks and reduces data movement. Experiments with up to 1024 NVIDIA V100 GPUs demonstrate overhead lower than 1% for data-intensive operations.
引用
收藏
页码:131 / 140
页数:10
相关论文
共 50 条
  • [41] AutoInfer: Self-Driving Management for Resource-Efficient, SLO-Aware Machine=Learning Inference in GPU Clusters
    Cai, Binlei
    Guo, Qin
    Dong, Xiaodong
    IEEE INTERNET OF THINGS JOURNAL, 2023, 10 (07): : 6271 - 6285
  • [42] BGL: GPU-Efficient GNN Training by Optimizing Graph Data I/O and Preprocessing
    Liu, Tianfeng
    Chen, Yangrui
    Li, Dan
    Wu, Chuan
    Zhu, Yibo
    He, Jun
    Peng, Yanghua
    Chen, Hongzheng
    Chen, Hongzhi
    Guo, Chuanxiong
    PROCEEDINGS OF THE 20TH USENIX SYMPOSIUM ON NETWORKED SYSTEMS DESIGN AND IMPLEMENTATION, NSDI 2023, 2023, : 103 - 118
  • [43] I/O-efficient GPU-based acceleration of coherent dedispersion for pulsar observation
    Kong, Xiangcong
    Zheng, Xiaoying
    Zhu, Yongxin
    Duan, Gaoxiang
    Chen, Zikang
    JOURNAL OF SYSTEMS ARCHITECTURE, 2023, 142
  • [44] I/O Congestion-Aware Computing Resource Assignment and Scheduling in Virtualized Cloud Environments
    Wang, Yuwei
    Liu, Min
    Gao, Bo
    Qin, Chenchong
    Ma, Cheng
    2016 IEEE TRUSTCOM/BIGDATASE/ISPA, 2016, : 1280 - 1287
  • [45] Performance Enhancement for Network I/O Virtualization with Efficient Interrupt Coalescing and Virtual Receive-Side Scaling
    Guan, HaiBing
    Dong, YaoZu
    Ma, RuHui
    Xu, DongXiao
    Zhang, Yang
    Li, Jian
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2013, 24 (06) : 1118 - 1128
  • [46] Novel fuzzy multi objective DVFS-aware consolidation heuristics for energy and SLA efficient resource management in cloud data centers
    Arianyan, Ehsan
    Taheri, Hassan
    Khoshdel, Vahid
    JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2017, 78 : 43 - 61
  • [47] PASS: a simple, efficient parallelism-aware solid state drive I/O scheduler
    Li, Hong-yan
    Xiong, Nai-xue
    Huang, Ping
    Gui, Chao
    JOURNAL OF ZHEJIANG UNIVERSITY-SCIENCE C-COMPUTERS & ELECTRONICS, 2014, 15 (05): : 321 - 336
  • [48] PASS: a simple, efficient parallelism-aware solid state drive I/O scheduler
    Hong-yan LI
    Nai-xue XIONG
    Ping HUANG
    Chao GUI
    Frontiers of Information Technology & Electronic Engineering, 2014, 15 (05) : 321 - 336
  • [49] PASS: a simple, efficient parallelism-aware solid state drive I/O scheduler
    Hong-yan Li
    Nai-xue Xiong
    Ping Huang
    Chao Gui
    Journal of Zhejiang University SCIENCE C, 2014, 15 : 321 - 336
  • [50] Whole Cancer Genome Analysis Using an I/O Aware Job Scheduler on High Performance Computing Resource
    Lee, Junehawk
    Kang, Hyojin
    Yu, Seokjong
    Kim, Chul
    Yea, Sang-Jun
    2014 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2014,