Adaptive Runtime-Assisted Block Prefetching on Chip-Multiprocessors

被引:0
|
作者
Garcia, Victor [1 ,2 ]
Rico, Alejandro [2 ]
Villavieja, Carlos [3 ]
Carpenter, Paul [2 ]
Navarro, Nacho [1 ,2 ]
Ramirez, Alex [4 ]
机构
[1] Univ Politecn Cataluna, Barcelona, Spain
[2] Barcelona Supercomp Ctr, Barcelona, Spain
[3] Google Inc, New York, NY USA
[4] NVIDIA Corp, Santa Clara, CA USA
关键词
Cache memories; Prefetch; Task based programming models;
D O I
10.1007/s10766-016-0431-8
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Memory stalls are a significant source of performance degradation in modern processors. Data prefetching is a widely adopted and well studied technique used to alleviate this problem. Prefetching can be performed by the hardware, or be initiated and controlled by software. Among software controlled prefetching we find a wide variety of schemes, including runtime-directed prefetching and more specifically runtime-directed block prefetching. This paper proposes a hybrid prefetching mechanism that integrates a software driven block prefetcher with existing hardware prefetching techniques. Our runtime-assisted software prefetcher brings large blocks of data on-chip with the support of a low cost hardware engine, and synergizes with existing hardware prefetchers that manage locality at a finer granularity. The runtime system that drives the prefetch engine dynamically selects which cache to prefetch to. Our evaluation on a set of scientific benchmarks obtains a maximum speed up of 32 and 10 % on average compared to a baseline with hardware prefetching only. As a result, we also achieve a reduction of up to 18 and 3 % on average in energy-to-solution.
引用
收藏
页码:530 / 550
页数:21
相关论文
共 50 条
  • [31] Towards Time-Predictable Data Caches for Chip-Multiprocessors
    Schoeberl, Martin
    Puffitsch, Wolfgang
    Huber, Benedikt
    SOFTWARE TECHNOLOGIES FOR EMBEDDED AND UBIQUITOUS SYSTEMS, PROCEEDINGS, 2009, 5860 : 180 - 191
  • [32] Runtime code parallelization for on-chip multiprocessors
    Kandemir, M
    Zhang, W
    Karakoy, M
    DESIGN, AUTOMATION AND TEST IN EUROPE CONFERENCE AND EXHIBITION, PROCEEDINGS, 2003, : 510 - 515
  • [33] A Simple Activation/Deactivation Prefetching Scheme for Chip Multiprocessors
    Selfa, Vicent
    Gomez, Crispin
    Gomez, Maria E.
    Sahuquillo, Julio
    2016 24TH EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING (PDP), 2016, : 143 - 150
  • [34] Optimizing Redundancy Design for Chip-Multiprocessors for Flexible Utility Functions
    Cheng, Da
    Gupta, Sandeep K.
    2014 IEEE INTERNATIONAL TEST CONFERENCE (ITC), 2014,
  • [35] Future execution: A hardware prefetching technique for chip multiprocessors
    Ganusov, I
    Burtscher, M
    PACT 2005: 14TH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, 2005, : 350 - 360
  • [36] Inter and Intra Kernel Reuse Analysis Driven Pipelining on Chip-Multiprocessors
    Bathen, Luis Angel D.
    Ahn, Yongjin
    Dutt, Nikil D.
    2010 INTERNATIONAL SYMPOSIUM ON VLSI DESIGN AUTOMATION AND TEST (VLSI-DAT), 2010, : 203 - 206
  • [37] In-place Irregular Computation for Message-passing Chip-multiprocessors
    Zhang Youhui
    Zhang Youyang
    Li Yanhua
    Fei Xiang
    Zheng Weimin
    2017 46TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING WORKSHOPS (ICPPW), 2017, : 69 - 76
  • [38] A Framework for Memory-aware Multimedia Application Mapping on Chip-Multiprocessors
    Bathen, Luis Angel D.
    Dutt, Nikil D.
    Pasricha, Sudeep
    PROCEEDINGS OF THE 2008 IEEE/ACM/IFIP WORKSHOP ON EMBEDDED SYSTEMS FOR REAL-TIME MULTIMEDIA, 2008, : 89 - +
  • [39] Balancing On-Chip Network Latency in Multi-Application Mapping for Chip-Multiprocessors
    Zhu, Di
    Chen, Lizhong
    Yue, Siyu
    Pinkston, Timothy M.
    Pedram, Massoud
    2014 IEEE 28TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM, 2014,
  • [40] Characterizing Soft Error Vulnerability of Cache Coherence Protocols for Chip-Multiprocessors
    Zheng, Chuanlei
    Wang, Shuai
    PROCEEDINGS OF THE 2014 IEEE INTERNATIONAL SYMPOSIUM ON DEFECT AND FAULT TOLERANCE IN VLSI AND NANOTECHNOLOGY SYSTEMS (DFTS), 2014, : 15 - 20