Characterizing Large Dataset GPU Compute Workloads Targeting Systems with Die-Stacked Memory

被引:0
|
作者
Ramanathan, Srividya [1 ]
Hazari, Gautam [1 ]
Lahiri, Kanishka [1 ]
Spadini, Francesco [1 ]
机构
[1] Adv Micro Devices Inc AMD, Sunnyvale, CA 94088 USA
关键词
D O I
10.1109/HiPC.2015.25
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The increasing adoption of GPUs as mainstream computing devices, coupled with the imminent availability of large high-bandwidth caches based on die-stacked memory makes it important to analyze and understand modern GPU compute applications from the perspective of their memory access and data reuse characteristics. This paper presents detailed workload characterization studies on four GPU compute applications that process large data sets. The applications studied include tree traversal and search algorithms, a partial differential equation (PDE) solver, and a synthetic array processing application. Our studies indicate that while the memory footprint consumed by these applications can be very large, the effectiveness of several GB worth of cache may vary significantly across workloads. This suggests that provisioning cache resources in a die-stacked memory based system needs to be done very carefully, through detailed characterization of target workloads. An added benefit of our work was the discovery that accurate memory characterization data can lead to a significantly more optimized strategy for scheduling GPU threads by taking advantage of a workload's access characteristics. In particular, for the PDE solver, our analysis led to an optimization that achieved 30% measured gain in application performance. This paper also describes our analysis methodology for conducting these types of studies. The methodology is based on trace analysis, where the traces capture memory traffic and calls to the GPU compute API. For each application we highlight the characterization metrics and analysis techniques that were most useful in generating insights about their memory access patterns.
引用
收藏
页码:204 / 213
页数:10
相关论文
共 23 条
  • [11] Heterogeneous Memory Architectures: A HW/SW Approach for Mixing Die-stacked and Off-package Memories
    Meswani, Mitesh R.
    Blagodurov, Sergey
    Roberts, David
    Slice, John
    Ignatowski, Mike
    Loh, Gabriel H.
    2015 IEEE 21ST INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA), 2015, : 126 - 136
  • [12] Smart refresh: An enhanced memory controller design for reducing energy in conventional and 3D die-stacked DRAMs
    Ghosh, Mrinmoy
    Lee, Hsien-Hsin S.
    MICRO-40: PROCEEDINGS OF THE 40TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, 2007, : 134 - +
  • [13] Exploiting Adaptive Data Compression to Improve Performance and Energy-efficiency of Compute Workloads in Multi-GPU Systems
    Tavana, Mohammad Khavari
    Sun, Yifan
    Agostini, Nicolas Bohm
    Kaeli, David
    2019 IEEE 33RD INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2019), 2019, : 664 - 674
  • [14] CACTI-3DD: Architecture-level Modeling for 3D Die-stacked DRAM Main Memory
    Chen, Ke
    Li, Sheng
    Muralimanohar, Naveen
    Ahn, Jung Ho
    Brockman, Jay B.
    Jouppi, Norman P.
    DESIGN, AUTOMATION & TEST IN EUROPE (DATE 2012), 2012, : 33 - 38
  • [15] Characterizing parallel workloads to reduce multiple writer overhead in Shared Virtual Memory systems
    Petit, S
    Sahuquillo, J
    Pont, A
    10TH EUROMICRO WORKSHOP ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING, PROCEEDINGS, 2002, : 261 - 268
  • [16] DRIS-3: Deep Neural Network Reliability Improvement Scheme in 3D Die-Stacked Memory based on Fault Analysis
    Kim, Jae-San
    Yang, Joon-Sung
    PROCEEDINGS OF THE 2019 56TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2019,
  • [17] RATT-ECC: Rate Adaptive Two-Tiered Error Correction Codes for Reliable 3D Die-Stacked Memory
    Chen, Hsing-Min
    Wu, Carole-Jean
    Mudge, Trevor
    Chakrabarti, Chaitali
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2016, 13 (03)
  • [18] Characterizing and optimizing TPC-C workloads on large-scale systems using SSD arrays
    Zhai, Jidong
    Zhang, Feng
    Li, Qingwen
    Chen, Wenguang
    Zheng, Weimin
    SCIENCE CHINA-INFORMATION SCIENCES, 2016, 59 (09)
  • [19] Characterizing and optimizing TPC-C workloads on large-scale systems using SSD arrays
    Jidong ZHAI
    Feng ZHANG
    Qingwen LI
    Wenguang CHEN
    Weimin ZHENG
    Science China(Information Sciences), 2016, 59 (09) : 33 - 46
  • [20] HEXO: Offloading Long-Running Compute- and Memory-Intensive Workloads on Low-Cost, Low-Power Embedded Systems
    Olivier, Pierre
    Mehrab, A. K. M. Fazla
    Errabelly, Sandeep
    Lankes, Stefan
    Karaoui, Mohamed Lamine
    Lyerly, Robert
    Kim, Sang-Hoon
    Barbalace, Antonio
    Ravindran, Binoy
    IEEE TRANSACTIONS ON CLOUD COMPUTING, 2024, 12 (04) : 1415 - 1432