Characterizing Large Dataset GPU Compute Workloads Targeting Systems with Die-Stacked Memory

被引:0
|
作者
Ramanathan, Srividya [1 ]
Hazari, Gautam [1 ]
Lahiri, Kanishka [1 ]
Spadini, Francesco [1 ]
机构
[1] Adv Micro Devices Inc AMD, Sunnyvale, CA 94088 USA
关键词
D O I
10.1109/HiPC.2015.25
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The increasing adoption of GPUs as mainstream computing devices, coupled with the imminent availability of large high-bandwidth caches based on die-stacked memory makes it important to analyze and understand modern GPU compute applications from the perspective of their memory access and data reuse characteristics. This paper presents detailed workload characterization studies on four GPU compute applications that process large data sets. The applications studied include tree traversal and search algorithms, a partial differential equation (PDE) solver, and a synthetic array processing application. Our studies indicate that while the memory footprint consumed by these applications can be very large, the effectiveness of several GB worth of cache may vary significantly across workloads. This suggests that provisioning cache resources in a die-stacked memory based system needs to be done very carefully, through detailed characterization of target workloads. An added benefit of our work was the discovery that accurate memory characterization data can lead to a significantly more optimized strategy for scheduling GPU threads by taking advantage of a workload's access characteristics. In particular, for the PDE solver, our analysis led to an optimization that achieved 30% measured gain in application performance. This paper also describes our analysis methodology for conducting these types of studies. The methodology is based on trace analysis, where the traces capture memory traffic and calls to the GPU compute API. For each application we highlight the characterization metrics and analysis techniques that were most useful in generating insights about their memory access patterns.
引用
收藏
页码:204 / 213
页数:10
相关论文
共 23 条
  • [1] Power Profiling of Modern Die-Stacked Memory
    Stow, Dylan
    Farmahini-Farahani, Amin
    Gurumurthi, Sudhanva
    Ignatowski, Michael
    Xie, Yuan
    IEEE COMPUTER ARCHITECTURE LETTERS, 2019, 18 (02) : 132 - 135
  • [2] Understanding and Optimizing GPU Cache Memory Performance for Compute Workloads
    Choo, Kyoshin
    Panlener, William
    Jang, Byunghyun
    2014 IEEE 13TH INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED COMPUTING (ISPDC), 2014, : 189 - 196
  • [3] Bumblebee: A MemCache Design for Die-stacked and Off-chip Heterogeneous Memory Systems
    Hua, Yifan
    Zheng, Shengan
    Yin, Ji
    Chen, Weidong
    Huang, Linpeng
    2023 60TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC, 2023,
  • [4] Deploying Hash Tables on Die-Stacked High Bandwidth Memory
    Cheng, Xuntao
    He, Bingsheng
    Lo, Eric
    Wang, Wei
    Lu, Shengliang
    Chen, Xinyu
    PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM '19), 2019, : 239 - 248
  • [5] Millipede: Die-Stacked Memory Optimizations for Big Data Machine Learning Analytics
    Nitin
    Thottethodi, Mithuna
    Vijaykumar, T. N.
    2018 32ND IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2018, : 160 - 171
  • [6] A Cost-effective and Energy-efficient Architecture for Die-stacked DRAM/NVM Memory Systems
    Guo, Yuhua
    Xiao, Weijun
    Liu, Qing
    He, Xubin
    2018 IEEE 37TH INTERNATIONAL PERFORMANCE COMPUTING AND COMMUNICATIONS CONFERENCE (IPCCC), 2018,
  • [7] Efficiently Enabling Conventional Block Sizes for Very Large Die-stacked DRAM Caches
    Loh, Gabriel H.
    Hill, Mark D.
    PROCEEDINGS OF THE 2011 44TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO 44), 2011, : 454 - 464
  • [8] A Dual Grain Hit-Miss Detector for Large Die-Stacked DRAM Caches
    El-Nacouzi, Michel
    Atta, Islam
    Papadopoulou, Myrto
    Zebchuk, Jason
    Jerger, Natalie Enright
    Moshovos, Andreas
    DESIGN, AUTOMATION & TEST IN EUROPE, 2013, : 89 - 92
  • [9] i-MIRROR: A Software Managed Die-Stacked DRAM-Based Memory Subsystem
    Ryoo, Jee Ho
    Ganesan, Karthik
    Chen, Yao-Min
    John, Lizy K.
    2015 27TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD), 2015, : 82 - 89
  • [10] SELF: A High Performance and Bandwidth Efficient Approach to Exploiting Die-stacked DRAM as Part of Memory
    Guo, Yuhua
    Liu, Qing
    Xiao, Weijun
    Huang, Ping
    Podhorszki, Norbert
    Klasky, Scott
    He, Xubin
    2017 IEEE 25TH INTERNATIONAL SYMPOSIUM ON MODELING, ANALYSIS, AND SIMULATION OF COMPUTER AND TELECOMMUNICATION SYSTEMS (MASCOTS), 2017, : 187 - 197