Transparent I/O-Aware GPU Virtualization for Efficient Resource Consolidation

被引:2
|
作者
Gonzalez, Nelson Mimura [1 ]
Elengikal, Tonia [1 ]
机构
[1] IBM Thomas J Watson Res Ctr, 1101 Kitchawan Rd, Yorktown Hts, NY 10598 USA
关键词
GPU; virtualization; consolidation; disaggregalion; doud; HPC; HFGPU; OmniGPU; HFCUDA;
D O I
10.1109/IPDPS49936.2021.00022
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Graphics processing units (GPUs) are widely used in high performance computing (IIPC) and cloud computing to accelerate workloads. Virtualization provides flexible access to resources while improving utilization and throughput. This is essential to resource disaggregation, which allows ubiquitous access to remote resources among nodes. However, remote GPU virtualization at scale suffers from severe performance degradation due to inter-node communication and resource consolidation overhead, especially for data-intensive workloads. We propose HFGPU, a CPU virtualization solution transparent to application code based on application programming interface (API) remoting. We define a virtual device manager that allows remote GPUs to be seen, managed, and used as though they were local. To perform at scale we combine multi-adapter InfiniBand networking with a novel distributed UO forwarding mechanism that eliminates consolidation bottlenecks and reduces data movement. Experiments with up to 1024 NVIDIA V100 GPUs demonstrate overhead lower than 1% for data-intensive operations.
引用
收藏
页码:131 / 140
页数:10
相关论文
共 50 条
  • [31] ReNIC: Architectural Extension to SR-IOV I/O Virtualization for Efficient Replication
    Dong, Yaozu
    Chen, Yu
    Pan, Zhenhao
    Dai, Jinquan
    Jiang, Yunhong
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2012, 8 (04)
  • [32] Optimizing Network I/O Virtualization with Efficient Interrupt Coalescing and Virtual Receive Side Scaling
    Dong, Yaozu
    Xu, Dongxiao
    Zhang, Yang
    Liao, Guangdeng
    2011 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2011, : 26 - 34
  • [33] I/O Virtualization Utilizing an Efficient Hardware System-level Memory Management Unit
    Kornaros, George
    Harteros, Konstantinos
    Christoforakis, Ioannis
    Astrinaki, Maria
    2014 INTERNATIONAL SYMPOSIUM ON SYSTEM-ON-CHIP (SOC), 2014,
  • [34] Survivable transparent OFDM optical grids/clouds: fragmentation aware, resource efficient protection with fast failure recovery
    Das, Sougata
    Chatterjee, Monish
    PHOTONIC NETWORK COMMUNICATIONS, 2024, 47 (01) : 39 - 56
  • [35] Survivable transparent OFDM optical grids/clouds: fragmentation aware, resource efficient protection with fast failure recovery
    Sougata Das
    Monish Chatterjee
    Photonic Network Communications, 2024, 47 : 39 - 56
  • [36] Toward an Analysable, Scalable, Energy-Efficient I/O Virtualization for Mixed-Criticality Systems
    Jiang, Zhe
    Dai, Xiaotian
    Dong, Pan
    Wei, Ran
    Yang, Dawei
    Audsley, Neil C.
    Guan, Nan
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2022, 41 (02) : 320 - 333
  • [37] Power-efficient and High-performance Block I/O Framework for Mobile Virtualization Systems
    Lee, Kihong
    Lee, Dongwoo
    Eom, Young Ik
    ACM IMCOM 2015, PROCEEDINGS, 2015,
  • [38] Power-efficient and high-performance block I/O framework for mobile virtualization systems
    Kihong Lee
    DongWoo Lee
    Sungkil Lee
    Young Ik Eom
    The Journal of Supercomputing, 2017, 73 : 1307 - 1321
  • [39] An Efficient, QoS-aware I/O Scheduler for Solid State Drive
    Zhang, Quan
    Feng, Dan
    Wang, Fang
    Xie, Yanwen
    2013 IEEE 15TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS & 2013 IEEE INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING (HPCC_EUC), 2013, : 1408 - 1415
  • [40] Power-efficient and high-performance block I/O framework for mobile virtualization systems
    Lee, Kihong
    Lee, DongWoo
    Lee, Sungkil
    Eom, Young Ik
    JOURNAL OF SUPERCOMPUTING, 2017, 73 (04): : 1307 - 1321