A Quantitative Study of Locality in GPU Caches for Memory-Divergent Workloads

被引:3
|
作者
Lal, Sohan [1 ,2 ]
Varma, Bogaraju Sharatchandra [3 ]
Juurlink, Ben [2 ]
机构
[1] Tech Univ Hamburg, Hamburg, Germany
[2] Tech Univ Berlin, Berlin, Germany
[3] Ulster Univ, Jordanstown, North Ireland
关键词
Data locality; GPU caches; Memory divergence;
D O I
10.1007/s10766-022-00729-2
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
GPUs are capable of delivering peak performance in TFLOPs, however, peak performance is often difficult to achieve due to several performance bottlenecks. Memory divergence is one such performance bottleneck that makes it harder to exploit locality, cause cache thrashing, and high miss rate, therefore, impeding GPU performance. As data locality is crucial for performance, there have been several efforts to exploit data locality in GPUs. However, there is a lack of quantitative analysis of data locality, which could pave the way for optimizations. In this paper, we quantitatively study the data locality and its limits in GPUs at different granularities. We show that, in contrast to previous studies, there is a significantly higher inter-warp locality at the L1 data cache for memory-divergent workloads. We further show that about 50% of the cache capacity and other scarce resources such as NoC bandwidth are wasted due to data over-fetch caused by memory divergence. While the low spatial utilization of cache lines justifies the sectored-cache design to only fetch those sectors of a cache line that are needed during a request, our limit study reveals the lost spatial locality for which additional memory requests are needed to fetch the other sectors of the same cache line. The lost spatial locality presents opportunities for further optimizing the cache design.
引用
收藏
页码:189 / 216
页数:28
相关论文
共 50 条
  • [1] A Quantitative Study of Locality in GPU Caches for Memory-Divergent Workloads
    Sohan Lal
    Bogaraju Sharatchandra Varma
    Ben Juurlink
    International Journal of Parallel Programming, 2022, 50 : 189 - 216
  • [2] A Quantitative Study of Locality in GPU Caches
    Lal, Sohan
    Juurlink, Ben
    EMBEDDED COMPUTER SYSTEMS: ARCHITECTURES, MODELING, AND SIMULATION, SAMOS 2020, 2020, 12471 : 228 - 242
  • [3] Modeling Emerging Memory-Divergent GPU Applications
    Wang, Lu
    Jahre, Magnus
    Adileh, Almutaz
    Wang, Zhiying
    Eeckhout, Lieven
    IEEE COMPUTER ARCHITECTURE LETTERS, 2019, 18 (02) : 95 - 98
  • [4] Analyzing Data Locality on GPU Caches Using Static Profiling of Workloads
    Kim, Jieun
    Eom, Hyeonsang
    Kim, Yoonhee
    IEEE ACCESS, 2023, 11 : 95939 - 95947
  • [5] Selective Replication in Memory-Side GPU Caches
    Zhao, Xia
    Jahre, Magnus
    Eeckhout, Lieven
    2020 53RD ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO 2020), 2020, : 967 - 980
  • [6] Understanding and Optimizing GPU Cache Memory Performance for Compute Workloads
    Choo, Kyoshin
    Panlener, William
    Jang, Byunghyun
    2014 IEEE 13TH INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED COMPUTING (ISPDC), 2014, : 189 - 196
  • [7] GPU Memory Reallocation Techniques in Fully Homomorphic Encryption Workloads
    Choi, Jake
    Jung, Sunchul
    Yeom, Heonyoung
    39TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2024, 2024, : 1525 - 1532
  • [8] Optimizing Locality-Aware Memory Management of Key-Value Caches
    Hu, Xiameng
    Wang, Xiaolin
    Zhou, Lan
    Luo, Yingwei
    Ding, Chen
    Jiang, Song
    Wang, Zhenlin
    IEEE TRANSACTIONS ON COMPUTERS, 2017, 66 (05) : 862 - 875
  • [9] Analyzing data locality in GPU kernels using memory footprint analysis
    Kiani, Mohsen
    Rajabzadeh, Amir
    SIMULATION MODELLING PRACTICE AND THEORY, 2019, 91 : 102 - 122
  • [10] Characterizing Large Dataset GPU Compute Workloads Targeting Systems with Die-Stacked Memory
    Ramanathan, Srividya
    Hazari, Gautam
    Lahiri, Kanishka
    Spadini, Francesco
    2015 IEEE 22ND INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC), 2015, : 204 - 213