Transparent I/O-Aware GPU Virtualization for Efficient Resource Consolidation

被引:2
|
作者
Gonzalez, Nelson Mimura [1 ]
Elengikal, Tonia [1 ]
机构
[1] IBM Thomas J Watson Res Ctr, 1101 Kitchawan Rd, Yorktown Hts, NY 10598 USA
关键词
GPU; virtualization; consolidation; disaggregalion; doud; HPC; HFGPU; OmniGPU; HFCUDA;
D O I
10.1109/IPDPS49936.2021.00022
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Graphics processing units (GPUs) are widely used in high performance computing (IIPC) and cloud computing to accelerate workloads. Virtualization provides flexible access to resources while improving utilization and throughput. This is essential to resource disaggregation, which allows ubiquitous access to remote resources among nodes. However, remote GPU virtualization at scale suffers from severe performance degradation due to inter-node communication and resource consolidation overhead, especially for data-intensive workloads. We propose HFGPU, a CPU virtualization solution transparent to application code based on application programming interface (API) remoting. We define a virtual device manager that allows remote GPUs to be seen, managed, and used as though they were local. To perform at scale we combine multi-adapter InfiniBand networking with a novel distributed UO forwarding mechanism that eliminates consolidation bottlenecks and reduces data movement. Experiments with up to 1024 NVIDIA V100 GPUs demonstrate overhead lower than 1% for data-intensive operations.
引用
收藏
页码:131 / 140
页数:10
相关论文
共 50 条
  • [1] I/O-aware gang scheduling
    Nakazawa, M
    Lowenthal, DK
    PARALLEL AND DISTRIBUTED COMPUTING SYSTEMS, PROCEEDINGS, 2003, : 163 - 168
  • [2] I/O-Aware Flushing for HPC Caching Filesystem
    Tatebe, Osamu
    Hiraga, Kohei
    Ohtsuji, Hiroki
    2023 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING WORKSHOPS, CLUSTER WORKSHOPS, 2023, : 11 - 17
  • [3] RaFIO: A Random Forest I/O-Aware algorithm
    Slimani, Camelia
    Wu, Chun-Feng
    Chang, Yuan-Hao
    Rubini, Stephane
    Boukhobza, Jalil
    36TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2021, 2021, : 521 - 528
  • [4] I/O-Aware Batch Scheduling for Petascale Computing Systems
    Zhou, Zhou
    Yang, Xu
    Zhao, Dongfang
    Rich, Paul
    Tang, Wei
    Wang, Jia
    Lan, Zhiling
    2015 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING - CLUSTER 2015, 2015, : 254 - 263
  • [5] I/O-aware list scheduling for distributed embedded systems
    Karakehayov, Zdravko
    2005 IEEE INTELLIGENT DATA ACQUISITION AND ADVANCED COMPUTING SYSTEMS: TECHNOLOGY AND APPLICATIONS, 2005, : 167 - 172
  • [6] I/O-aware bandwidth allocation systems for petascale computing
    Zhou, Zhou
    Yang, Xu
    Zhao, Dongfang
    Rich, Paul
    Tang, Wei
    Wang, Jia
    Lan, Zhilin
    PARALLEL COMPUTING, 2016, 58 : 107 - 116
  • [7] IOPA: I/O-aware parallelism adaption for parallel programs
    Liu, Tao
    Liu, Yi
    Qian, Chen
    Qian, Depei
    PLOS ONE, 2017, 12 (03):
  • [8] Efficient consolidation-aware VCPU scheduling on multicore virtualization platform
    Wang, Bei
    Cheng, Yuxia
    Chen, Wenzhi
    He, Qinming
    Xiang, Yang
    Hassan, Mohammad Mehedi
    Alelaiwi, Abdulhameed
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2016, 56 : 229 - 237
  • [9] iTRIM: I/O-Aware TRIM for Improving User Experience on Mobile Devices
    Liang, Yu
    Ji, Cheng
    Fu, Chenchen
    Ausavarungnirun, Rachata
    Li, Qiao
    Pan, Riwei
    Chen, Siyu
    Shi, Liang
    Kuo, Tei-Wei
    Xue, Chun Jason
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2021, 40 (09) : 1782 - 1795
  • [10] SpotKV: Improving Read Throughput of KVS by I/O-aware Cache and Adaptive Cuckoo Filters
    Liu, Yi
    Zhou, Ruilin
    Gan, Yuhang
    Qian, Chen
    2024 IEEE 17TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, CLOUD 2024, 2024, : 344 - 354