Latency-aware DVFS for efficient power state transitions on many-core architectures

被引:0
|
作者
Zhiquan Lai
King Tin Lam
Cho-Li Wang
Jinshu Su
机构
[1] National University of Defense Technology,National Key Laboratory of Parallel and Distributed Processing, College of Computer
[2] The University of Hong Kong,Department of Computer Science
来源
关键词
Power management; Dynamic voltage and frequency scaling; Profiling; Shared virtual memory; Many-core processors; The single-chip cloud computer;
D O I
暂无
中图分类号
学科分类号
摘要
Energy efficiency is quickly becoming a first-class design constraint in high-performance computing (HPC). We need more efficient power management solutions to save energy costs and carbon footprint of HPC systems. Dynamic voltage and frequency scaling (DVFS) is a commonly used power management technique for making a trade-off between power consumption and system performance according to the time-varying program behavior. However, prior work on DVFS seldom takes into account the voltage and frequency scaling latencies, which we found to be a crucial factor determining the efficiency of the power management scheme. Frequent power state transitions without latency awareness can make a real impact on the execution performance of applications. The design of multiple voltage domains in some many-core architectures has made the effect of DVFS latencies even more significant. These concerns lead us to propose a new latency-aware DVFS scheme to adjust the optimal power state more accurately. Our main idea is to analyze the latency characteristics in depth and design a novel profile-guided DVFS solution which exploits the varying execution patterns of the parallel program to avoid excessive power state transitions. We implement the solution into a power management library for use by shared-memory parallel applications. Experimental evaluation on the Intel SCC many-core platform shows significant improvement in power efficiency after using our scheme. Compared with a latency-unaware approach, we achieve 24.0 % extra energy saving, 31.3 % more reduction in the energy–delay product and 15.2 % less overhead in execution time in the average case for various benchmarks. Our algorithm is also proved to outperform a prior DVFS approach attempted to mitigate the latency effects.
引用
收藏
页码:2720 / 2747
页数:27
相关论文
共 50 条
  • [31] Message Passing-Aware Power Management on Many-Core Systems
    Bartolini, Andrea
    Hankendi, Can
    Coskun, Ayse Kivilcim
    Benini, Luca
    [J]. JOURNAL OF LOW POWER ELECTRONICS, 2014, 10 (04) : 531 - 549
  • [32] On the Complexity of Mapping Feasibility in Many-Core Architectures
    Schwarzer, Tobias
    Roloff, Sascha
    Richthammer, Valentina
    Khaldi, Rami
    Wildermann, Stefan
    Glass, Michael
    Teich, Juergen
    [J]. 2018 IEEE 12TH INTERNATIONAL SYMPOSIUM ON EMBEDDED MULTICORE/MANY-CORE SYSTEMS-ON-CHIP (MCSOC 2018), 2018, : 176 - 183
  • [33] Initial condition for efficient mapping of level set algorithms on many-core architectures
    Tornai, Gabor Janos
    Cserey, Gyoergy
    [J]. EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2014,
  • [34] Initial condition for efficient mapping of level set algorithms on many-core architectures
    Gábor János Tornai
    György Cserey
    [J]. EURASIP Journal on Advances in Signal Processing, 2014
  • [35] Accelerating Dedispersion Using Many-core Architectures
    Novotny, Jan
    Adamek, Karel
    Clark, M. A.
    Giles, Mike
    Armour, Wes
    [J]. ASTROPHYSICAL JOURNAL SUPPLEMENT SERIES, 2023, 269 (01):
  • [36] Fast Convolution Operations on Many-Core Architectures
    Li, Shigang
    Zhang, Yunquan
    Xiang, Chunyang
    Shi, Lei
    [J]. 2015 IEEE 17TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, 2015 IEEE 7TH INTERNATIONAL SYMPOSIUM ON CYBERSPACE SAFETY AND SECURITY, AND 2015 IEEE 12TH INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS (ICESS), 2015, : 316 - 323
  • [37] Efficient task spawning for shared memory and message passing in many-core architectures
    Zaib, Aurang
    Wild, Thomas
    Herkersdorf, Andreas
    Heisswolf, Jan
    Becker, Juergen
    Weichslgartner, Andreas
    Teich, Juergen
    [J]. JOURNAL OF SYSTEMS ARCHITECTURE, 2017, 77 : 72 - 82
  • [38] USING INDIRECTION TO MINIMIZE MESSAGE DELIVERY LATENCY ON CACHE-LESS MANY-CORE ARCHITECTURES
    Kroeker, Anthony
    Dimopoulos, Nikitas J.
    Khunjush, Farshad
    [J]. 2012 25TH IEEE CANADIAN CONFERENCE ON ELECTRICAL & COMPUTER ENGINEERING (CCECE), 2012,
  • [39] Efficient acceleration structure layout for 64-bit many-core architectures
    Shevtsov, Maxim
    Soupikov, Alexei
    [J]. WSCG 2010: POSTER PROCEEDINGS, 2010, : 53 - 56
  • [40] Phase Detection with Hidden Markov Models for DVFS on Many-Core Processors
    Booth, Joshua Dennis
    Kotra, Jagadish
    Zhao, Hui
    Kandemir, Mahmut
    Raghavan, Padma
    [J]. 2015 IEEE 35TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS, 2015, : 185 - 195