PARMA: Parallelization-Aware Run-Time Management for Energy-Efficient Many-Core Systems

被引:7
|
作者
Al-hayanni, Mohammed A. Noaman [1 ,2 ]
Rafiev, Ashur [3 ,4 ]
Xia, Fei [4 ]
Shafik, Rishad [5 ]
Romanovsky, Alexander [3 ]
Yakovlev, Alex [1 ]
机构
[1] Newcastle Univ, Newcastle Upon Tyne NE1 7RU, Tyne & Wear, England
[2] Univ Technol Baghdad, Dept Elect Engn, Baghdad 10001, Iraq
[3] Newcastle Univ, Sch Comp, Newcastle Upon Tyne, Tyne & Wear, England
[4] Newcastle Univ, Sch Engn, Newcastle Upon Tyne, Tyne & Wear, England
[5] Newcastle Univ, Elect Syst, Newcastle Upon Tyne, Tyne & Wear, England
基金
英国工程与自然科学研究理事会;
关键词
IP networks; Computational modeling; Hardware; System performance; Optimization; Measurement; Monitoring; Run-time management; many-core; speedup; power modelling; energy-delay-product; energy per instruction; POWER; PERFORMANCE; VOLTAGE;
D O I
10.1109/TC.2020.2975787
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Performance and energy efficiency considerations have shifted computing paradigms from single-core to many-core architectures. At the same time, traditional speedup models such as Amdahl's Law face challenges in the run-time reasoning for system performance and energy efficiency, because these models typically assume limited variations of the parallel fraction. Moreover, the parallel fraction, which varies dynamically in workloads, is generally unknown at run-time without application-level instrumentation. This article describes novel performance/energy trade-off models based on realistic architectural considerations, which describe the parallel fraction and speedup as functions of performance counter values available in modern processors, removing the need for application-level instrumentation. These are then used to develop a Parallelization-Aware Run-time Management (PARMA) approach. PARMA aims at controlling core allocations and operating voltage/frequency points for energy efficiency, according to the varying workload parallel fractions. The efficacy of our models and the PARMA approach is extensively validated using a number of PARSEC benchmark applications, involving two performance/energy trade-off metrics: energy-delay-product (EDP), typically used in high-performance applications and energy per instruction (EPI), suitable for energy-aware applications. Up to 48 and 68 percent improvements in EDP and EPI have been observed using the PARMA approach compared with parallelization-agnostic methods.
引用
收藏
页码:1507 / 1518
页数:12
相关论文
共 50 条
  • [21] WeNA: Deterministic Run-time Task Mapping for Performance Improvement in Many-core Embedded Systems
    Huang, Le-Tian
    Dong, Hui
    Wang, Jun-Shi
    Daneshtalab, Masoud
    Li, Guang-Jun
    [J]. IEEE EMBEDDED SYSTEMS LETTERS, 2015, 7 (04) : 93 - 96
  • [22] DAARM: Design-Time Application Analysis and Run-Time Mapping for Predictable Execution in Many-Core Systems
    Weichslgartner, Andreas
    Gangadharan, Deepak
    Wildermann, Stefan
    Glass, Michael
    Teich, Juergen
    [J]. 2014 INTERNATIONAL CONFERENCE ON HARDWARE/SOFTWARE CODESIGN AND SYSTEM SYNTHESIS (CODES+ISSS), 2014,
  • [23] Model-free Runtime Management of Concurrent Workloads for Energy-Efficient Many-Core Heterogeneous Systems
    Aalsaud, Ali
    Rafiev, Ashur
    Xia, Fei
    Shafik, Rishad
    Yakovlev, Alex
    [J]. 2018 28TH INTERNATIONAL SYMPOSIUM ON POWER AND TIMING MODELING, OPTIMIZATION AND SIMULATION (PATMOS), 2018, : 206 - 213
  • [24] Runtime Energy Management for Many-Core Systems
    Martins, Andre L. M.
    Sant'Ana, Anderson C.
    Moraes, Fernando G.
    [J]. 23RD IEEE INTERNATIONAL CONFERENCE ON ELECTRONICS CIRCUITS AND SYSTEMS (ICECS 2016), 2016, : 380 - 383
  • [25] Mixed-Criticality Run-Time Task Mapping for NoC-Based Many-Core Systems
    Fattah, Mohammad
    Rahmani, Amir-Mohammad
    Xu, Thomas Canhao
    Kanduri, Anil
    Liljeberg, Pasi
    Plosila, Juha
    Tenhunen, Hannu
    [J]. 2014 22ND EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING (PDP 2014), 2014, : 458 - 465
  • [26] Designing Energy-Efficient Many-Core Servers for Exascale Computing
    Alonso, David Atienza
    [J]. 2017 30TH SYMPOSIUM ON INTEGRATED CIRCUITS AND SYSTEMS DESIGN (SBCCI 2017): CHOP ON SANDS, 2017, : XVIII - XVIII
  • [27] Energy-Efficient Concurrent Testing Approach for Many-Core Systems in the Dark Silicon Age
    Haghbayan, Mohammad-Hashem
    Rahmani, Amir-Mohammad
    Liljeberg, Pasi
    Plosila, Juha
    Tenhunen, Hannu
    [J]. PROCEEDINGS OF THE 2014 IEEE INTERNATIONAL SYMPOSIUM ON DEFECT AND FAULT TOLERANCE IN VLSI AND NANOTECHNOLOGY SYSTEMS (DFTS), 2014, : 270 - 275
  • [28] Dark Silicon Aware Resource Management for Many-Core Systems
    Khdr, Heba
    Pagani, Santiago
    Shafique, Muhammad
    Henkel, Joerg
    [J]. DARK SILICON AND FUTURE ON-CHIP SYSTEMS, 2018, 110 : 127 - 170
  • [29] A Divide and Conquer based Distributed Run-time Mapping Methodology for Many-Core platforms
    Anagnostopoulos, Iraklis
    Bartzas, Alexandros
    Kathareios, Georgios
    Soudris, Dimitrios
    [J]. DESIGN, AUTOMATION & TEST IN EUROPE (DATE 2012), 2012, : 111 - 116
  • [30] Voltage Island-Aware Energy-Efficient Scheduling of Parallel Streaming Tasks on Many-Core CPUs
    Melot, Nicolas
    Kessler, Christoph
    Keller, Joerg
    [J]. 2020 28TH EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING (PDP 2020), 2020, : 157 - 161