Latency-aware DVFS for efficient power state transitions on many-core architectures

被引：0

作者：

Zhiquan Lai

King Tin Lam

Cho-Li Wang

Jinshu Su

机构：

[1] National University of Defense Technology,National Key Laboratory of Parallel and Distributed Processing, College of Computer

[2] The University of Hong Kong,Department of Computer Science

来源：

The Journal of Supercomputing | 2015年 / 71卷

关键词：

Power management; Dynamic voltage and frequency scaling; Profiling; Shared virtual memory; Many-core processors; The single-chip cloud computer;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Energy efficiency is quickly becoming a first-class design constraint in high-performance computing (HPC). We need more efficient power management solutions to save energy costs and carbon footprint of HPC systems. Dynamic voltage and frequency scaling (DVFS) is a commonly used power management technique for making a trade-off between power consumption and system performance according to the time-varying program behavior. However, prior work on DVFS seldom takes into account the voltage and frequency scaling latencies, which we found to be a crucial factor determining the efficiency of the power management scheme. Frequent power state transitions without latency awareness can make a real impact on the execution performance of applications. The design of multiple voltage domains in some many-core architectures has made the effect of DVFS latencies even more significant. These concerns lead us to propose a new latency-aware DVFS scheme to adjust the optimal power state more accurately. Our main idea is to analyze the latency characteristics in depth and design a novel profile-guided DVFS solution which exploits the varying execution patterns of the parallel program to avoid excessive power state transitions. We implement the solution into a power management library for use by shared-memory parallel applications. Experimental evaluation on the Intel SCC many-core platform shows significant improvement in power efficiency after using our scheme. Compared with a latency-unaware approach, we achieve 24.0 % extra energy saving, 31.3 % more reduction in the energy–delay product and 15.2 % less overhead in execution time in the average case for various benchmarks. Our algorithm is also proved to outperform a prior DVFS approach attempted to mitigate the latency effects.

引用

页码：2720 / 2747

页数：27

共 50 条

[31] Message Passing-Aware Power Management on Many-Core Systems
Bartolini, Andrea
Hankendi, Can
Coskun, Ayse Kivilcim
Benini, Luca
[J]. JOURNAL OF LOW POWER ELECTRONICS, 2014, 10 (04) : 531 - 549
[32] On the Complexity of Mapping Feasibility in Many-Core Architectures
Schwarzer, Tobias
Roloff, Sascha
Richthammer, Valentina
Khaldi, Rami
Wildermann, Stefan
Glass, Michael
Teich, Juergen
[J]. 2018 IEEE 12TH INTERNATIONAL SYMPOSIUM ON EMBEDDED MULTICORE/MANY-CORE SYSTEMS-ON-CHIP (MCSOC 2018), 2018, : 176 - 183
[33] Initial condition for efficient mapping of level set algorithms on many-core architectures
Tornai, Gabor Janos
Cserey, Gyoergy
[J]. EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2014,
[34] Initial condition for efficient mapping of level set algorithms on many-core architectures
Gábor János Tornai
György Cserey
[J]. EURASIP Journal on Advances in Signal Processing, 2014
[35] Accelerating Dedispersion Using Many-core Architectures
Novotny, Jan
Adamek, Karel
Clark, M. A.
Giles, Mike
Armour, Wes
[J]. ASTROPHYSICAL JOURNAL SUPPLEMENT SERIES, 2023, 269 (01):
[36] Fast Convolution Operations on Many-Core Architectures
Li, Shigang
Zhang, Yunquan
Xiang, Chunyang
Shi, Lei
[J]. 2015 IEEE 17TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, 2015 IEEE 7TH INTERNATIONAL SYMPOSIUM ON CYBERSPACE SAFETY AND SECURITY, AND 2015 IEEE 12TH INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS (ICESS), 2015, : 316 - 323
[37] Efficient task spawning for shared memory and message passing in many-core architectures
Zaib, Aurang
Wild, Thomas
Herkersdorf, Andreas
Heisswolf, Jan
Becker, Juergen
Weichslgartner, Andreas
Teich, Juergen
[J]. JOURNAL OF SYSTEMS ARCHITECTURE, 2017, 77 : 72 - 82
[38] USING INDIRECTION TO MINIMIZE MESSAGE DELIVERY LATENCY ON CACHE-LESS MANY-CORE ARCHITECTURES
Kroeker, Anthony
Dimopoulos, Nikitas J.
Khunjush, Farshad
[J]. 2012 25TH IEEE CANADIAN CONFERENCE ON ELECTRICAL & COMPUTER ENGINEERING (CCECE), 2012,
[39] Efficient acceleration structure layout for 64-bit many-core architectures
Shevtsov, Maxim
Soupikov, Alexei
[J]. WSCG 2010: POSTER PROCEEDINGS, 2010, : 53 - 56
[40] Phase Detection with Hidden Markov Models for DVFS on Many-Core Processors
Booth, Joshua Dennis
Kotra, Jagadish
Zhao, Hui
Kandemir, Mahmut
Raghavan, Padma
[J]. 2015 IEEE 35TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS, 2015, : 185 - 195

← 1 2 3 4 5 →