Orchestrated Scheduling and Partitioning for Improved Address Translation in GPUs

被引:1
|
作者
Li, Bingyao [1 ]
Wang, Yueqi [1 ]
Tang, Xulong [1 ]
机构
[1] Univ Pittsburgh, Dept Comp Sci, Pittsburgh, PA 15260 USA
关键词
UVM; GPU; TLB;
D O I
10.1109/DAC56929.2023.10247943
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Unified Virtual Memory (UVM) is a promising feature in CPU-GPU heterogeneous systems that allows data structures to be accessed by both CPU and GPUs through unified pointers without explicit data copying. However, the delivered performance of UVM significantly relies on the efficiency of address translation. The current GPU thread block (TB) management is not aware of the translation process and heavily thrashes the per-streaming multiprocessor (SM) private Translation Look-ahead Buffers (TLBs). In this paper, we conduct a comprehensive characterization of 10 GPU benchmarks and quantify the translation reuses among the thread blocks. Our observation reveals that there exists substantial translation reuse within TBs rather than across the TBs. Moreover, the inter-TB interference significantly enlarges the intra-TB translation reuse distances. To this end, we propose a translation-aware TB scheduling and lightweight GPU L1 TLB partitioning to effectively mitigate the contention. Experimental results show that our proposed approach improves the L1 TLB hit rate, and this improvement translates to, on average, a 12.5% execution time reduction.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] Partitioning GPUs for Improved Scalability
    Janzen, Johan
    Black-Schaffer, David
    Hugo, Andra
    [J]. PROCEEDINGS OF 28TH IEEE INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING, (SBAC-PAD 2016), 2016, : 42 - 49
  • [2] ActivePointers: A Case for Software Address Translation on GPUs
    Shahar, Sagi
    Bergman, Shai
    Silberstein, Mark
    [J]. 2016 ACM/IEEE 43RD ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2016, : 596 - 608
  • [3] Architectural Support for Address Translation on GPUs Designing Memory Management Units for CPU/GPUs with Unified Address Spaces
    Pichai, Bharath
    Hsu, Lisa
    Bhattacharjee, Abhishek
    [J]. ACM SIGPLAN NOTICES, 2014, 49 (04) : 743 - 757
  • [4] Improving Address Translation in Multi-GPUs via Sharing and Spilling aware TLB Design
    Li, Bingyao
    Yin, Jieming
    Zhang, Youtao
    Tang, Xulong
    [J]. PROCEEDINGS OF 54TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, MICRO 2021, 2021, : 1154 - 1168
  • [5] Earliest Virtual Deadline Zero Laxity Scheduling for Improved Responsiveness of Mobile GPUs
    Choi, Seongrim
    Cho, Suhwan
    Park, Jonghyun
    Nam, Byeong-Gyu
    [J]. JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, 2017, 17 (01) : 162 - 166
  • [6] Address Assignment Sensitive Variable Partitioning and scheduling for DSPs with multiple memory banks
    Xue, Chun Jason
    Liu, Tiantian
    Shao, Zili
    Hu, Jingtong
    Jia, Zhiping
    Jia, Weijia
    Sha, Edwin H. -M.
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 1453 - +
  • [7] Hardware Compute Partitioning on NVIDIA GPUs
    Bakita, Joshua
    Anderson, James H.
    [J]. 2023 IEEE 29TH REAL-TIME AND EMBEDDED TECHNOLOGY AND APPLICATIONS SYMPOSIUM, RTAS, 2023, : 54 - 66
  • [8] An Improved Blind Optimization Algorithm for Hardware/Software Partitioning and Scheduling
    Zhao, Xin
    Zhang, Tao
    An, Xinqi
    Fan, Long
    [J]. ADVANCES IN SWARM INTELLIGENCE, ICSI 2018, PT II, 2018, 10942 : 225 - 234
  • [9] Softshell: Dynamic Scheduling on GPUs
    Steinberger, Markus
    Kainz, Bernhard
    Kerbl, Bernhard
    Hauswiesner, Stefan
    Kenzel, Michael
    Schmalstieg, Dieter
    [J]. ACM TRANSACTIONS ON GRAPHICS, 2012, 31 (06):
  • [10] Orchestrated centers for the production of proteins or "translation factories"
    Crawford, Robert A.
    Eastham, Matthew
    Pool, Martin R.
    Ashe, Mark P.
    [J]. WILEY INTERDISCIPLINARY REVIEWS-RNA, 2024, 15 (04)