Adaptive Runtime-Assisted Block Prefetching on Chip-Multiprocessors

被引:0
|
作者
Garcia, Victor [1 ,2 ]
Rico, Alejandro [2 ]
Villavieja, Carlos [3 ]
Carpenter, Paul [2 ]
Navarro, Nacho [1 ,2 ]
Ramirez, Alex [4 ]
机构
[1] Univ Politecn Cataluna, Barcelona, Spain
[2] Barcelona Supercomp Ctr, Barcelona, Spain
[3] Google Inc, New York, NY USA
[4] NVIDIA Corp, Santa Clara, CA USA
关键词
Cache memories; Prefetch; Task based programming models;
D O I
10.1007/s10766-016-0431-8
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Memory stalls are a significant source of performance degradation in modern processors. Data prefetching is a widely adopted and well studied technique used to alleviate this problem. Prefetching can be performed by the hardware, or be initiated and controlled by software. Among software controlled prefetching we find a wide variety of schemes, including runtime-directed prefetching and more specifically runtime-directed block prefetching. This paper proposes a hybrid prefetching mechanism that integrates a software driven block prefetcher with existing hardware prefetching techniques. Our runtime-assisted software prefetcher brings large blocks of data on-chip with the support of a low cost hardware engine, and synergizes with existing hardware prefetchers that manage locality at a finer granularity. The runtime system that drives the prefetch engine dynamically selects which cache to prefetch to. Our evaluation on a set of scientific benchmarks obtains a maximum speed up of 32 and 10 % on average compared to a baseline with hardware prefetching only. As a result, we also achieve a reduction of up to 18 and 3 % on average in energy-to-solution.
引用
收藏
页码:530 / 550
页数:21
相关论文
共 50 条
  • [1] Adaptive Runtime-Assisted Block Prefetching on Chip-Multiprocessors
    Victor Garcia
    Alejandro Rico
    Carlos Villavieja
    Paul Carpenter
    Nacho Navarro
    Alex Ramirez
    International Journal of Parallel Programming, 2017, 45 : 530 - 550
  • [2] B-Fetch: Branch Prediction Directed Prefetching for Chip-Multiprocessors
    Kadjo, David
    Kim, Jinchun
    Sharma, Prabal
    Panda, Reena
    Gratz, Paul
    Jimenez, Daniel
    2014 47TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO), 2014, : 623 - 634
  • [3] Runtime 3-D Stacked Cache Management for Chip-Multiprocessors
    Jung, Jongpil
    Kang, Kyungsu
    De Micheli, Giovanni
    Kyung, Chong-Min
    PROCEEDINGS OF THE FOURTEENTH INTERNATIONAL SYMPOSIUM ON QUALITY ELECTRONIC DESIGN (ISQED 2013), 2013, : 68 - 72
  • [4] Adaptive Prefetching for Shared Cache Based Chip Multiprocessors
    Kandemir, Mahmut
    Zhang, Yuanrui
    Ozturk, Ozcan
    DATE: 2009 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, VOLS 1-3, 2009, : 773 - +
  • [5] Characterization of TCC on chip-multiprocessors
    McDonald, A
    Chung, JW
    Chafi, H
    Minh, CC
    Carlstrom, BD
    Hammond, L
    Kozyrakis, C
    Olukotun, K
    PACT 2005: 14TH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, 2005, : 63 - 74
  • [6] A Discrete Thermal Controller for Chip-Multiprocessors
    Cui, Yingnan
    Zhang, Wei
    He, Bingsheng
    PROCEEDINGS OF THE 2016 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE), 2016, : 67 - 72
  • [7] Virtualizing network-on-chip resources in chip-multiprocessors
    Trivino, Francisco
    Sanchez, Jose L.
    Alfaro, Francisco J.
    Flich, Jose
    MICROPROCESSORS AND MICROSYSTEMS, 2011, 35 (02) : 230 - 245
  • [8] Fair Access to External Memory for Chip-multiprocessors
    Yang, Shufan
    Wu, Qiang
    Xiao, Xiongren
    Li, Renfa
    Hillenbrand, Dominic
    2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS & PHD FORUM (IPDPSW), 2012, : 300 - 305
  • [9] Temperature-Aware Runtime Power Management for Chip-Multiprocessors with 3-D Stacked Cache
    Kang, Kyungsu
    De Micheli, Giovanni
    Lee, Seunghan
    Kyung, Chong-Min
    PROCEEDINGS OF THE FIFTEENTH INTERNATIONAL SYMPOSIUM ON QUALITY ELECTRONIC DESIGN (ISQED 2014), 2015, : 163 - +
  • [10] GigaNoC - A hierarchical Network-on-Chip for scalable Chip-Multiprocessors
    Puttmann, Christoph
    Niemann, Joerg-Christian
    Porrmann, Mario
    Rueckert, Ulrich
    DSD 2007: 10TH EUROMICRO CONFERENCE ON DIGITAL SYSTEM DESIGN ARCHITECTURES, METHODS AND TOOLS, PROCEEDINGS, 2007, : 495 - +