Adaptive Runtime-Assisted Block Prefetching on Chip-Multiprocessors

被引:0
|
作者
Garcia, Victor [1 ,2 ]
Rico, Alejandro [2 ]
Villavieja, Carlos [3 ]
Carpenter, Paul [2 ]
Navarro, Nacho [1 ,2 ]
Ramirez, Alex [4 ]
机构
[1] Univ Politecn Cataluna, Barcelona, Spain
[2] Barcelona Supercomp Ctr, Barcelona, Spain
[3] Google Inc, New York, NY USA
[4] NVIDIA Corp, Santa Clara, CA USA
关键词
Cache memories; Prefetch; Task based programming models;
D O I
10.1007/s10766-016-0431-8
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Memory stalls are a significant source of performance degradation in modern processors. Data prefetching is a widely adopted and well studied technique used to alleviate this problem. Prefetching can be performed by the hardware, or be initiated and controlled by software. Among software controlled prefetching we find a wide variety of schemes, including runtime-directed prefetching and more specifically runtime-directed block prefetching. This paper proposes a hybrid prefetching mechanism that integrates a software driven block prefetcher with existing hardware prefetching techniques. Our runtime-assisted software prefetcher brings large blocks of data on-chip with the support of a low cost hardware engine, and synergizes with existing hardware prefetchers that manage locality at a finer granularity. The runtime system that drives the prefetch engine dynamically selects which cache to prefetch to. Our evaluation on a set of scientific benchmarks obtains a maximum speed up of 32 and 10 % on average compared to a baseline with hardware prefetching only. As a result, we also achieve a reduction of up to 18 and 3 % on average in energy-to-solution.
引用
收藏
页码:530 / 550
页数:21
相关论文
共 50 条
  • [21] SecCMP: Enhancing Critical Secrets Protection in Chip-Multiprocessors
    Yang, Li
    Peng, Lu
    Ramadass, Balachandran
    INTERNATIONAL JOURNAL OF INFORMATION SECURITY AND PRIVACY, 2008, 2 (04) : 54 - 66
  • [22] Embedded RAIDs-on-Chip for Bus-Based Chip-Multiprocessors
    Bathen, Luis Angel D.
    Dutt, Nikil D.
    ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2014, 13 (04)
  • [23] Runtime Thermal Management for 3-D Chip-Multiprocessors With Hybrid SRAM/MRAM L2 Cache
    Lee, Seunghan
    Kang, Kyungsu
    Kyung, Chong-Min
    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2015, 23 (03) : 520 - 533
  • [24] Runtime 3-D Stacked Cache Data Management for Energy Minimization of 3-D Chip-Multiprocessors
    Lee, Seunghan
    Kang, Kyungsu
    Jung, Jongpil
    Kyung, Chong-Min
    PROCEEDINGS OF THE FIFTEENTH INTERNATIONAL SYMPOSIUM ON QUALITY ELECTRONIC DESIGN (ISQED 2014), 2015, : 197 - +
  • [25] Optimizing Redundancy Design for Chip-Multiprocessors for Flexible Utility Functions
    Cheng, Da
    Gupta, Sandeep K.
    2013 25TH INTERNATIONAL TELETRAFFIC CONGRESS (ITC), 2013,
  • [26] Design space exploration for device and architectural heterogeneity in chip-multiprocessors
    Zhang, Ying
    Irving, Samuel
    Peng, Lu
    Fu, Xin
    Koppelman, David
    Zhang, Weihua
    Ardonne, Jesse
    MICROPROCESSORS AND MICROSYSTEMS, 2016, 40 : 88 - 101
  • [27] Using Switchable Pins to Increase Off-Chip Bandwidth in Chip-Multiprocessors
    Chen, Shaoming
    Irving, Samuel
    Peng, Lu
    Hu, Yue
    Zhang, Ying
    Srivastava, Ashok
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (01) : 274 - 289
  • [28] Designing a Physical Locality Aware Coherence Protocol for Chip-Multiprocessors
    Fensch, Christian
    Barrow-Williams, Nick
    Mullins, Robert D.
    Moore, Simon
    IEEE TRANSACTIONS ON COMPUTERS, 2013, 62 (05) : 914 - 928
  • [29] Near Data Computation for Message-passing Chip-multiprocessors
    Li, Yanhua
    Zhang, Youhui
    Song, Kunpeng
    Wang, Haibin
    Zheng, Weiming
    PROCEEDINGS OF 2016 IEEE 18TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS; IEEE 14TH INTERNATIONAL CONFERENCE ON SMART CITY; IEEE 2ND INTERNATIONAL CONFERENCE ON DATA SCIENCE AND SYSTEMS (HPCC/SMARTCITY/DSS), 2016, : 703 - 710
  • [30] Optimizing Redundancy Design for Chip-Multiprocessors for Flexible Utility Functions
    Cheng, Da
    Gupta, Sandeep K.
    2013 IEEE INTERNATIONAL TEST CONFERENCE (ITC), 2013,