Maestro: Orchestrating Lifetime Reliability in Chip Multiprocessors

被引:0
|
作者
Feng, Shuguang [1 ]
Gupta, Shantanu [1 ]
Ansari, Amin [1 ]
Mahlke, Scott [1 ]
机构
[1] Univ Michigan, Adv Comp Architecture Lab, Ann Arbor, MI 48109 USA
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
As CMOS feature sizes venture deep into the nanometer regime, wearout mechanisms including negative-bias temperature instability and time-dependent dielectric breakdown can severely reduce processor operating lifetimes and performance. This paper presents an introspective reliability management system, Maestro, to tackle reliability challenges in future chip multiprocessors (CMPs) head-on. Unlike traditional approaches, Maestro relies on low-level sensors to monitor the CMP as it ages (introspection). Leveraging this real-time assessment of CMP health, runtime heuristics identify wearout-centric job assignments (management). By exploiting the complementary effects of the natural heterogeneity (due to process variation and wearout) that exists in CMPs and the diversity found in system workloads, Maestro composes job schedules that intelligently control the aging process. Monte Carlo experiments show that Maestro significantly enhances lifetime reliability through intelligent wear-leveling, increasing the expected service life of a population of 16-core CMPs by as much as 38% compared to a naive, round-robin scheduler. Furthermore, in the presence of process variation, Maestro's wearout-centric scheduling outperformed both performance counter and temperature sensor based schedulers, achieving an order of magnitude more improvement in lifetime throughput the amount of useful work done by a system prior to failure.
引用
收藏
页码:186 / 200
页数:15
相关论文
共 50 条
  • [1] Dynamic Lifetime Reliability Management for Chip Multiprocessors
    Moghaddam, Milad Ghorbani
    Ababei, Cristinel
    [J]. IEEE TRANSACTIONS ON MULTI-SCALE COMPUTING SYSTEMS, 2018, 4 (04): : 952 - 958
  • [2] Improving Yield and Reliability of Chip Multiprocessors
    Pan, Abhisek
    Khan, Omer
    Kundu, Sandip
    [J]. DATE: 2009 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, VOLS 1-3, 2009, : 490 - 495
  • [3] Compiler Directed Network-on-Chip Reliability Enhancement for Chip Multiprocessors
    Ozturk, Ozcan
    Kandemir, Mahmut
    Irwin, Mary J.
    Narayanan, H. K.
    [J]. LCTES 10-PROCEEDINGS OF THE ACM SIGPLAN/SIGBED 2010 CONFERENCE ON LANGUAGES, COMPILERS, & TOOLS FOR EMBEDDED SYSTEMS, 2010, : 85 - 94
  • [4] Compiler Directed Network-on-Chip Reliability Enhancement for Chip Multiprocessors
    Ozturk, Ozcan
    Kandemir, Mahmut
    Irwin, Mary J.
    Narayanan, H. K.
    [J]. ACM SIGPLAN NOTICES, 2010, 45 (04) : 85 - 94
  • [5] A Hardware Framework for Yield and Reliability Enhancement in Chip Multiprocessors
    Pan, Abhisek
    Rodrigues, Rance
    Kundu, Sandip
    [J]. ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2015, 14 (01)
  • [6] Reliability-aware core partitioning in chip multiprocessors
    Oz, Isil
    Topcuoglu, Haluk Rahmi
    Kandemir, Mahmut
    Tosun, Oguz
    [J]. JOURNAL OF SYSTEMS ARCHITECTURE, 2012, 58 (3-4) : 160 - 176
  • [7] Dynamic Energy and Reliability Management in Network-on-Chip based Chip Multiprocessors
    Moghaddam, Milad Ghorbani
    [J]. 2017 EIGHTH INTERNATIONAL GREEN AND SUSTAINABLE COMPUTING CONFERENCE (IGSC), 2017,
  • [8] Towards a Better Lifetime for Non-Volatile Caches in Chip Multiprocessors
    Agarwal, Sukarn
    Kapoor, Hemangee K.
    [J]. 2017 30TH INTERNATIONAL CONFERENCE ON VLSI DESIGN AND 2017 16TH INTERNATIONAL CONFERENCE ON EMBEDDED SYSTEMS (VLSID 2017), 2017, : 29 - 34
  • [9] The Maestro Attack: Orchestrating Malicious Flows with BGP
    McDaniel, Tyler
    Smith, Jared M.
    Schuchard, Max
    [J]. SECURITY AND PRIVACY IN COMMUNICATION NETWORKS (SECURECOMM 2020), PT I, 2020, 335 : 97 - 117
  • [10] Investigation of DVFS Based Dynamic Reliability Management for Chip Multiprocessors
    Moghaddam, Milad Ghorbani
    Yamamoto, Alexandre
    Ababei, Cristinel
    [J]. PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS 2015), 2015, : 563 - 568