Revolver: Processor Architecture for Power Efficient Loop Execution

被引：0

作者：

Hayenga, Mitchell ^{[1
,2
]}

Naresh, Vignyan Reddy Kothinti ^{[2
]}

Lipasti, Mikko H. ^{[2
]}

机构：

[1] ARM Inc, Cambridge, England

[2] Univ Wisconsin, Madison, WI 53706 USA

来源：

2014 20TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA-20) | 2014年

关键词：

CACHE;

D O I：

暂无

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

With the rise of mobile and cloud-based computing, modern processor design has become the task of achieving maximum power efficiency at specific performance targets. This trend, coupled with dwindling improvements in single-threaded performance, has led architects to predominately focus on energy efficiency. In this paper we note that for the majority of benchmarks, a substantial portion of execution time is spent executing simple loops. Capitalizing on the frequency of loops, we design an out-of-order processor architecture that achieves an aggressive level of performance while minimizing the energy consumed during the execution of loops. The Revolver architecture achieves energy efficiency during loop execution by enabling "in-place execution" of loops within the processor's out-of-order backend. Essentially, a few static instances of each loop instruction are dispatched to the out-of-order execution core by the processor frontend. The static instruction instances may each be executed multiple times in order to complete all necessary loop iterations. During loop execution the processor frontend, including instruction fetch, branch prediction, decode, allocation, and dispatch logic, can be completely clock gated. Additionally we propose a mechanism to pre-execute future loop iteration load instructions, thereby realizing parallelism beyond the loop iterations currently executing within the processor core. Employing Revolver across three benchmark suites, we eliminate 20, 55, and 84% of all frontend instruction dispatches. Overall, we find Revolver maintains performance, while resulting in 5.3%-18.3% energy-delay benefit over loop buffers or micro-op cache techniques alone.

引用

页码：591 / 602

页数：12

共 50 条

[31] Towards Efficient Superconducting Quantum Processor Architecture Design
Li, Gushu
Ding, Yufei
Xie, Yuan
[J]. TWENTY-FIFTH INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS (ASPLOS XXV), 2020, : 1031 - 1045
[32] An Efficient Stream Memory Architecture for Heterogeneous Multicore Processor
Deng, Rangyu
Xu, Weixia
Qiang Dou
Zhou, Hongwei
Dai, Zefu
Chen, Haiyan
[J]. ISCC: 2009 IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATIONS, VOLS 1 AND 2, 2009, : 287 - 290
[33] Efficient algorithm and architecture for post-processor in HDTV
Lee, JW
Park, JW
Yang, MH
Kang, SH
Choe, Y
[J]. IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 1998, 44 (01) : 16 - 26
[34] An energy efficient instruction window for scalable processor architecture
Choi, Min
Maeng, Seungryoul
[J]. IEICE TRANSACTIONS ON ELECTRONICS, 2008, E91C (09): : 1427 - 1436
[35] An efficient PIM (Processor-In-Memory) architecture for BLAST
Kang, JY
Gupta, S
Gaudiot, JL
[J]. CONFERENCE RECORD OF THE THIRTY-EIGHTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, VOLS 1 AND 2, 2004, : 503 - 507
[36] Efficient Fault Detection Architecture Design of Latch-based Low Power DSP/MCU Processor
Yu, Hai
Nicolaidis, Michael
Anghel, Lorena
Zergainoh, Nacer-Eddine
[J]. 2011 16TH IEEE EUROPEAN TEST SYMPOSIUM (ETS), 2011, : 93 - 98
[37] Low-power consumption architecture for embedded processor
Yoshida, Y
Song, BY
Okuhata, H
Onoye, T
Shirakawa, I
[J]. 1996 2ND INTERNATIONAL CONFERENCE ON ASIC, PROCEEDINGS, 1996, : 77 - 80
[38] Low Power Pipelined FFT Processor Architecture on FPGA
Hassan, S. L. M.
Sulaiman, N.
Halim, I. S. A.
[J]. 2018 9TH IEEE CONTROL AND SYSTEM GRADUATE RESEARCH COLLOQUIUM (ICSGRC2018), 2018, : 31 - 34
[39] A Method for Efficient Localization of Magnetic Field Sources Excited by Execution of Instructions in a Processor
Werner, Frank
Chu, Derrick Albert
Djordjevic, Antonije R.
Olcan, Dragan I.
Prvulovic, Milos
Zajic, Alenka
[J]. IEEE TRANSACTIONS ON ELECTROMAGNETIC COMPATIBILITY, 2018, 60 (03) : 613 - 622
[40] Reliable and Efficient Execution of Multiple Streaming Applications on Intel's SCC Processor
Schor, Lars
Rai, Devendra
Yang, Hoeseok
Bacivarov, Iuliana
Thiele, Lothar
[J]. EURO-PAR 2013: PARALLEL PROCESSING WORKSHOPS, 2014, 8374 : 790 - 800

← 1 2 3 4 5 →