MRPB: Memory Request Prioritization for Massively Parallel Processors

被引:0
|
作者
Jia, Wenhao [1 ]
Shaw, Kelly A. [2 ]
Martonosi, Margaret [1 ]
机构
[1] Princeton Univ, Princeton, NJ 08544 USA
[2] Univ Richmond, Richmond, VA 23173 USA
关键词
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Massively parallel, throughput-oriented systems such as graphics processing units (GPUs) offer high performance for a broad range of programs. They are, however, complex to program, especially because of their intricate memory hierarchies with multiple address spaces. In response, modern GPUs have widely adopted caches, hoping to providing smoother reductions in memory access traffic and latency. Unfortunately, GPU caches often have mixed or unpredictable performance impact due to cache contention that results from the high thread counts in GPUs. We propose the memory request prioritization buffer (MRPB) to ease GPU programming and improve GPU performance. This hardware structure improves caching efficiency of massively parallel workloads by applying two prioritization methods-request reordering and cache bypassing-to memory requests before they access a cache. MRPB then releases requests into the cache in a more cache-friendly order. The result is drastically reduced cache contention and improved use of the limited per-thread cache capacity. For a simulated 16KB L1 cache, MRPB improves the average performance of the entire PolyBench and Rodinia suites by 2.65 x and 1.27 x respectively, outperforming a state-of-the-art GPU cache management technique.
引用
收藏
页码:272 / 283
页数:12
相关论文
共 50 条
  • [31] Massively parallel memory-based parsing
    1600, Morgan Kaufmann Publ Inc, San Mateo, CA, USA (02):
  • [32] Scalable communication architectures for massively parallel hardware multi-processors
    Jan, Yahya
    Jozwiak, Lech
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2012, 72 (11) : 1450 - 1463
  • [33] Large scale finite element fluid analysis by massively parallel processors
    Nakabayashi, Y
    EUROSIM '96 - HPCN CHALLENGES IN TELECOMP AND TELECOM: PARALLEL SIMULATION OF COMPLEX SYSTEMS AND LARGE-SCALE APPLICATIONS, 1996, : 559 - 569
  • [34] MEMORY TESTING IN A MASSIVELY-PARALLEL MACHINE
    AKTOUF, C
    ROBACH, C
    MAZARE, G
    MICROPROCESSING AND MICROPROGRAMMING, 1993, 38 (1-5): : 245 - 252
  • [35] NEW MEMORY SEMANTICS FOR MASSIVELY PARALLEL COMPUTATION
    DAYTON, DB
    THOMSON, CM
    GEOPHYSICS, 1987, 52 (03) : 406 - 407
  • [36] A parallel DSP with memory and I/O processors
    Srini, VP
    Thendean, J
    Ueng, SZ
    Rabaey, JM
    PARALLEL AND DISTRIBUTED METHODS FOR IMAGE PROCESSING II, 1998, 3452 : 2 - 13
  • [37] Computing with waves in chemical media: Massively parallel reaction-diffusion on processors
    Adamatzky, A
    IEICE TRANSACTIONS ON ELECTRONICS, 2004, E87C (11): : 1748 - 1756
  • [38] Application of massively parallel processors to real time processing of high speed images
    Joo, YJ
    Fike, S
    Chung, KS
    Brooke, M
    Jokerst, NM
    Wills, DS
    PROCEEDINGS OF THE FOURTH INTERNATIONAL CONFERENCE - MASSIVELY PARALLEL PROCESSING USING OPTICAL INTERCONNECTIONS, 1997, : 96 - 100
  • [39] Hierarchical stack filtering: a bitplane-based algorithm for massively parallel processors
    Frias-Velazquez, Andres
    Ramon Morros, Josep
    Garcia, Mario
    Philips, Wilfried
    JOURNAL OF REAL-TIME IMAGE PROCESSING, 2019, 16 (05) : 1717 - 1730
  • [40] Large-scale finite element fluid analysis by massively parallel processors
    Yagawa, G
    Nakabayashi, Y
    Okuda, H
    PARALLEL COMPUTING, 1997, 23 (09) : 1365 - 1377