Characterizing Large Dataset GPU Compute Workloads Targeting Systems with Die-Stacked Memory

被引：0

作者：

Ramanathan, Srividya ^{[1
]}

Hazari, Gautam ^{[1
]}

Lahiri, Kanishka ^{[1
]}

Spadini, Francesco ^{[1
]}

机构：

[1] Adv Micro Devices Inc AMD, Sunnyvale, CA 94088 USA

来源：

2015 IEEE 22ND INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC) | 2015年

关键词：

D O I：

10.1109/HiPC.2015.25

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

The increasing adoption of GPUs as mainstream computing devices, coupled with the imminent availability of large high-bandwidth caches based on die-stacked memory makes it important to analyze and understand modern GPU compute applications from the perspective of their memory access and data reuse characteristics. This paper presents detailed workload characterization studies on four GPU compute applications that process large data sets. The applications studied include tree traversal and search algorithms, a partial differential equation (PDE) solver, and a synthetic array processing application. Our studies indicate that while the memory footprint consumed by these applications can be very large, the effectiveness of several GB worth of cache may vary significantly across workloads. This suggests that provisioning cache resources in a die-stacked memory based system needs to be done very carefully, through detailed characterization of target workloads. An added benefit of our work was the discovery that accurate memory characterization data can lead to a significantly more optimized strategy for scheduling GPU threads by taking advantage of a workload's access characteristics. In particular, for the PDE solver, our analysis led to an optimization that achieved 30% measured gain in application performance. This paper also describes our analysis methodology for conducting these types of studies. The methodology is based on trace analysis, where the traces capture memory traffic and calls to the GPU compute API. For each application we highlight the characterization metrics and analysis techniques that were most useful in generating insights about their memory access patterns.

引用

页码：204 / 213

页数：10

共 23 条

[21] A GPU-Accelerated In-Memory Metadata Management Scheme for Large-Scale Parallel File Systems
Chen, Zhi-Guang
Liu, Yu-Bo
Wang, Yong-Feng
Lu, Yu-Tong
JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2021, 36 (01) : 44 - 55
[22] A GPU-Accelerated In-Memory Metadata Management Scheme for Large-Scale Parallel File Systems
Zhi-Guang Chen
Yu-Bo Liu
Yong-Feng Wang
Yu-Tong Lu
Journal of Computer Science and Technology, 2021, 36 : 44 - 55
[23] Characterizing and optimizing TPC-C workloads on large-scale systems using SSD arrays在大规模系统上优化 TPC-C 评测程序
Jidong Zhai
Feng Zhang
Qingwen Li
Wenguang Chen
Weimin Zheng
Science China Information Sciences, 2016, 59

← 1 2 3 →