Revolver: Processor Architecture for Power Efficient Loop Execution

被引:0
|
作者
Hayenga, Mitchell [1 ,2 ]
Naresh, Vignyan Reddy Kothinti [2 ]
Lipasti, Mikko H. [2 ]
机构
[1] ARM Inc, Cambridge, England
[2] Univ Wisconsin, Madison, WI 53706 USA
关键词
CACHE;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
With the rise of mobile and cloud-based computing, modern processor design has become the task of achieving maximum power efficiency at specific performance targets. This trend, coupled with dwindling improvements in single-threaded performance, has led architects to predominately focus on energy efficiency. In this paper we note that for the majority of benchmarks, a substantial portion of execution time is spent executing simple loops. Capitalizing on the frequency of loops, we design an out-of-order processor architecture that achieves an aggressive level of performance while minimizing the energy consumed during the execution of loops. The Revolver architecture achieves energy efficiency during loop execution by enabling "in-place execution" of loops within the processor's out-of-order backend. Essentially, a few static instances of each loop instruction are dispatched to the out-of-order execution core by the processor frontend. The static instruction instances may each be executed multiple times in order to complete all necessary loop iterations. During loop execution the processor frontend, including instruction fetch, branch prediction, decode, allocation, and dispatch logic, can be completely clock gated. Additionally we propose a mechanism to pre-execute future loop iteration load instructions, thereby realizing parallelism beyond the loop iterations currently executing within the processor core. Employing Revolver across three benchmark suites, we eliminate 20, 55, and 84% of all frontend instruction dispatches. Overall, we find Revolver maintains performance, while resulting in 5.3%-18.3% energy-delay benefit over loop buffers or micro-op cache techniques alone.
引用
收藏
页码:591 / 602
页数:12
相关论文
共 50 条
  • [31] Towards Efficient Superconducting Quantum Processor Architecture Design
    Li, Gushu
    Ding, Yufei
    Xie, Yuan
    [J]. TWENTY-FIFTH INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS (ASPLOS XXV), 2020, : 1031 - 1045
  • [32] An Efficient Stream Memory Architecture for Heterogeneous Multicore Processor
    Deng, Rangyu
    Xu, Weixia
    Qiang Dou
    Zhou, Hongwei
    Dai, Zefu
    Chen, Haiyan
    [J]. ISCC: 2009 IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATIONS, VOLS 1 AND 2, 2009, : 287 - 290
  • [33] Efficient algorithm and architecture for post-processor in HDTV
    Lee, JW
    Park, JW
    Yang, MH
    Kang, SH
    Choe, Y
    [J]. IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 1998, 44 (01) : 16 - 26
  • [34] An energy efficient instruction window for scalable processor architecture
    Choi, Min
    Maeng, Seungryoul
    [J]. IEICE TRANSACTIONS ON ELECTRONICS, 2008, E91C (09): : 1427 - 1436
  • [35] An efficient PIM (Processor-In-Memory) architecture for BLAST
    Kang, JY
    Gupta, S
    Gaudiot, JL
    [J]. CONFERENCE RECORD OF THE THIRTY-EIGHTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, VOLS 1 AND 2, 2004, : 503 - 507
  • [36] Efficient Fault Detection Architecture Design of Latch-based Low Power DSP/MCU Processor
    Yu, Hai
    Nicolaidis, Michael
    Anghel, Lorena
    Zergainoh, Nacer-Eddine
    [J]. 2011 16TH IEEE EUROPEAN TEST SYMPOSIUM (ETS), 2011, : 93 - 98
  • [37] Low-power consumption architecture for embedded processor
    Yoshida, Y
    Song, BY
    Okuhata, H
    Onoye, T
    Shirakawa, I
    [J]. 1996 2ND INTERNATIONAL CONFERENCE ON ASIC, PROCEEDINGS, 1996, : 77 - 80
  • [38] Low Power Pipelined FFT Processor Architecture on FPGA
    Hassan, S. L. M.
    Sulaiman, N.
    Halim, I. S. A.
    [J]. 2018 9TH IEEE CONTROL AND SYSTEM GRADUATE RESEARCH COLLOQUIUM (ICSGRC2018), 2018, : 31 - 34
  • [39] A Method for Efficient Localization of Magnetic Field Sources Excited by Execution of Instructions in a Processor
    Werner, Frank
    Chu, Derrick Albert
    Djordjevic, Antonije R.
    Olcan, Dragan I.
    Prvulovic, Milos
    Zajic, Alenka
    [J]. IEEE TRANSACTIONS ON ELECTROMAGNETIC COMPATIBILITY, 2018, 60 (03) : 613 - 622
  • [40] Reliable and Efficient Execution of Multiple Streaming Applications on Intel's SCC Processor
    Schor, Lars
    Rai, Devendra
    Yang, Hoeseok
    Bacivarov, Iuliana
    Thiele, Lothar
    [J]. EURO-PAR 2013: PARALLEL PROCESSING WORKSHOPS, 2014, 8374 : 790 - 800