Orchestrated Scheduling and Partitioning for Improved Address Translation in GPUs

被引：1

作者：

Li, Bingyao ^{[1
]}

Wang, Yueqi ^{[1
]}

Tang, Xulong ^{[1
]}

机构：

[1] Univ Pittsburgh, Dept Comp Sci, Pittsburgh, PA 15260 USA

来源：

2023 60TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC | 2023年

关键词：

UVM; GPU; TLB;

D O I：

10.1109/DAC56929.2023.10247943

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Unified Virtual Memory (UVM) is a promising feature in CPU-GPU heterogeneous systems that allows data structures to be accessed by both CPU and GPUs through unified pointers without explicit data copying. However, the delivered performance of UVM significantly relies on the efficiency of address translation. The current GPU thread block (TB) management is not aware of the translation process and heavily thrashes the per-streaming multiprocessor (SM) private Translation Look-ahead Buffers (TLBs). In this paper, we conduct a comprehensive characterization of 10 GPU benchmarks and quantify the translation reuses among the thread blocks. Our observation reveals that there exists substantial translation reuse within TBs rather than across the TBs. Moreover, the inter-TB interference significantly enlarges the intra-TB translation reuse distances. To this end, we propose a translation-aware TB scheduling and lightweight GPU L1 TLB partitioning to effectively mitigate the contention. Experimental results show that our proposed approach improves the L1 TLB hit rate, and this improvement translates to, on average, a 12.5% execution time reduction.

引用

页数：6

共 50 条

[1] Partitioning GPUs for Improved Scalability
Janzen, Johan
Black-Schaffer, David
Hugo, Andra
[J]. PROCEEDINGS OF 28TH IEEE INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING, (SBAC-PAD 2016), 2016, : 42 - 49
[2] ActivePointers: A Case for Software Address Translation on GPUs
Shahar, Sagi
Bergman, Shai
Silberstein, Mark
[J]. 2016 ACM/IEEE 43RD ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2016, : 596 - 608
[3] Architectural Support for Address Translation on GPUs Designing Memory Management Units for CPU/GPUs with Unified Address Spaces
Pichai, Bharath
Hsu, Lisa
Bhattacharjee, Abhishek
[J]. ACM SIGPLAN NOTICES, 2014, 49 (04) : 743 - 757
[4] Improving Address Translation in Multi-GPUs via Sharing and Spilling aware TLB Design
Li, Bingyao
Yin, Jieming
Zhang, Youtao
Tang, Xulong
[J]. PROCEEDINGS OF 54TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, MICRO 2021, 2021, : 1154 - 1168
[5] Earliest Virtual Deadline Zero Laxity Scheduling for Improved Responsiveness of Mobile GPUs
Choi, Seongrim
Cho, Suhwan
Park, Jonghyun
Nam, Byeong-Gyu
[J]. JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, 2017, 17 (01) : 162 - 166
[6] Address Assignment Sensitive Variable Partitioning and scheduling for DSPs with multiple memory banks
Xue, Chun Jason
Liu, Tiantian
Shao, Zili
Hu, Jingtong
Jia, Zhiping
Jia, Weijia
Sha, Edwin H. -M.
[J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 1453 - +
[7] Hardware Compute Partitioning on NVIDIA GPUs
Bakita, Joshua
Anderson, James H.
[J]. 2023 IEEE 29TH REAL-TIME AND EMBEDDED TECHNOLOGY AND APPLICATIONS SYMPOSIUM, RTAS, 2023, : 54 - 66
[8] An Improved Blind Optimization Algorithm for Hardware/Software Partitioning and Scheduling
Zhao, Xin
Zhang, Tao
An, Xinqi
Fan, Long
[J]. ADVANCES IN SWARM INTELLIGENCE, ICSI 2018, PT II, 2018, 10942 : 225 - 234
[9] Softshell: Dynamic Scheduling on GPUs
Steinberger, Markus
Kainz, Bernhard
Kerbl, Bernhard
Hauswiesner, Stefan
Kenzel, Michael
Schmalstieg, Dieter
[J]. ACM TRANSACTIONS ON GRAPHICS, 2012, 31 (06):
[10] Orchestrated centers for the production of proteins or "translation factories"
Crawford, Robert A.
Eastham, Matthew
Pool, Martin R.
Ashe, Mark P.
[J]. WILEY INTERDISCIPLINARY REVIEWS-RNA, 2024, 15 (04)

← 1 2 3 4 5 →