MRPB: Memory Request Prioritization for Massively Parallel Processors

被引：0

作者：

Jia, Wenhao ^{[1
]}

Shaw, Kelly A. ^{[2
]}

Martonosi, Margaret ^{[1
]}

机构：

[1] Princeton Univ, Princeton, NJ 08544 USA

[2] Univ Richmond, Richmond, VA 23173 USA

来源：

2014 20TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA-20) | 2014年

关键词：

D O I：

暂无

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Massively parallel, throughput-oriented systems such as graphics processing units (GPUs) offer high performance for a broad range of programs. They are, however, complex to program, especially because of their intricate memory hierarchies with multiple address spaces. In response, modern GPUs have widely adopted caches, hoping to providing smoother reductions in memory access traffic and latency. Unfortunately, GPU caches often have mixed or unpredictable performance impact due to cache contention that results from the high thread counts in GPUs. We propose the memory request prioritization buffer (MRPB) to ease GPU programming and improve GPU performance. This hardware structure improves caching efficiency of massively parallel workloads by applying two prioritization methods-request reordering and cache bypassing-to memory requests before they access a cache. MRPB then releases requests into the cache in a more cache-friendly order. The result is drastically reduced cache contention and improved use of the limited per-thread cache capacity. For a simulated 16KB L1 cache, MRPB improves the average performance of the entire PolyBench and Rodinia suites by 2.65 x and 1.27 x respectively, outperforming a state-of-the-art GPU cache management technique.

引用

页码：272 / 283

页数：12

共 50 条

[41] Hierarchical stack filtering: a bitplane-based algorithm for massively parallel processors
Andrés Frías-Velázquez
Josep Ramon Morros
Mario García
Wilfried Philips
Journal of Real-Time Image Processing, 2019, 16 : 1717 - 1730
[42] Relaxations for High-Performance Message Passing on Massively Parallel SIMT Processors
Klenk, Benjamin
Froening, Holger
Eberle, Hans
Dennison, Larry
2017 31ST IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2017, : 855 - 865
[43] Massively Parallel Computation via Remote Memory Access
Behnezhad, Soheil
Dhulipala, Laxman
Esfandiari, Hossein
Lacki, Jakub
Mirrokni, Vahab
Schudy, Warren
SPAA'19: PROCEEDINGS OF THE 31ST ACM SYMPOSIUM ON PARALLELISM IN ALGORITHMS AND ARCHITECTURESS, 2019, 2019, : 59 - 68
[44] Microservers: A new memory semantics for massively parallel computing
Brockman, Jay B.
Kogge, Peter M.
Freeh, Vincent W.
Kuntz, Shannon K.
Sterling, Thomas L.
Proceedings of the International Conference on Supercomputing, 1999, : 454 - 463
[45] Massively Parallel Computation via Remote Memory Access
Behnezhad, Soheil
Dhulipala, Laxman
Esfandiari, Hossein
Lacki, Jakub
Mirrokni, Vahab
Schudy, Warren
ACM TRANSACTIONS ON PARALLEL COMPUTING, 2021, 8 (03)
[46] EXTENDING PARALLELISM TO MEMORY HIERARCHIES IN MASSIVELY PARALLEL SYSTEMS
ALSAQABI, KH
DAVIS, EW
IEE PROCEEDINGS-E COMPUTERS AND DIGITAL TECHNIQUES, 1991, 138 (04): : 193 - 202
[47] Enhanced memory architecture for massively parallel vision chip
Chen Zhe
Yang Jie
Liu Liyuan
Wu Nanjian
SELECTED PAPERS FROM CONFERENCES OF THE PHOTOELECTRONIC TECHNOLOGY COMMITTEE OF THE CHINESE SOCIETY OF ASTRONAUTICS 2014, PT II, 2015, 9522
[48] A parallel ant colony algorithm on massively parallel processors and its convergence analysis for the travelling salesman problem
Ling, Chen
Sun Hai-Ying
Shu, Wang
INFORMATION SCIENCES, 2012, 199 : 31 - 42
[49] Extending parallelism to memory hierarchies in massively parallel systems
Al-Saqabi, K.H., 1600, (138):
[50] THREAD PRIORITIZATION - A THREAD SCHEDULING MECHANISM FOR MULTIPLE-CONTEXT PARALLEL PROCESSORS
FISKE, S
DALLY, WJ
FUTURE GENERATION COMPUTER SYSTEMS, 1995, 11 (06) : 503 - 518

← 1 2 3 4 5 →