Balancing Reliability, Cost, and Performance Tradeoffs with FreeFault

被引:0
|
作者
Kim, Dong Wan [1 ]
Erez, Mattan [1 ]
机构
[1] Univ Texas Austin, Dept Elect & Comp Engn, Austin, TX 78712 USA
来源
2015 IEEE 21ST INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA) | 2015年
基金
美国国家科学基金会;
关键词
MEMORY;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Memory errors have been a major source of system failures and fault rates may rise even further as memory continues to scale. This increasing fault rate, especially when combined with advent of integrated on-package memories, may exceed the capabilities of traditional fault tolerance mechanisms or significantly increase their overhead. In this paper, we present FreeFault as a hardware-only, transparent, and nearly-free resilience mechanism that is implemented entirely within a processor and can tolerate the majority of DRAM faults. FreeFault repurposes portions of the last-level cache for storing retired memory regions and augments a hardware memory scrubber to monitor memory health and aid retirement decisions. Because it relies on existing structures (cache associativity) for retirement/remapping type repair, FreeFault has essentially no hardware overhead. Because it requires a very modest portion of the cache (as small as 8KB) to cover a large fraction of DRAM faults, FreeFault has almost no impact on performance. We explain how FreeFault adds an attractive layer in an overall resilience scheme of highly-reliable and highly-available systems by delaying, and even entirely avoiding, calling upon software to make tradeoff decisions between memory capacity, performance, and reliability.
引用
收藏
页码:439 / 450
页数:12
相关论文
共 50 条
  • [21] Balancing cost, water, emissions, and reliability in power systems operations
    Kravits, Jacob
    Kasprzyk, Joseph R.
    Baker, Kyri
    Stillwell, Ashlynn S.
    ENVIRONMENTAL RESEARCH LETTERS, 2024, 19 (01):
  • [22] Balancing cost and reliability: A quantitative study at Atlantic electric - Discussion
    Udo, VE
    Agarwal, SK
    Vojdani, AF
    Harlacher, MI
    IEEE TRANSACTIONS ON POWER SYSTEMS, 1997, 12 (03) : 1110 - 1111
  • [23] FAULT-TOLERANT MULTISTAGE INTERCONNECTION NETWORKS - PERFORMANCE RELIABILITY TRADEOFFS
    YANG, SC
    SILVESTER, JA
    COMPUTING SYSTEMS, 1990, 5 (04): : 233 - 242
  • [24] Performance/Reliability Tradeoffs When Watermarking Cyber-Physical Systems
    Krishna, C.M.
    IEEE Transactions on Industrial Cyber-Physical Systems, 2024, 2 : 606 - 614
  • [25] Reliability and performance tradeoffs in the design of on-chip power delivery and interconnects
    Taylor, Gregory F.
    Arabi, Tawfik
    Greub, Hans
    Muyshondt, Richard
    Manthe, Alicia
    Aminzadeh, Payman
    IEEE Topical Meeting on Electrical Performance of Electronic Packaging, 1999, : 49 - 52
  • [26] PAY-FOR-PERFORMANCE: BALANCING COST AND CARE
    Ziskind, M. A.
    Pierce, C. A.
    VALUE IN HEALTH, 2014, 17 (07) : A429 - A429
  • [27] Balancing Performance and Cost in CMP Interconnection Networks
    Abad, Pablo
    Puente, Valentin
    Angel Gregorio, Jose
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2012, 23 (03) : 452 - 459
  • [28] MODELING AND SIMULATION: BALANCING PERFORMANCE, SCHEDULE, AND COST
    Brown, Paul
    Kawazoe, Courtney
    Nguyen, Alex
    2020 WINTER SIMULATION CONFERENCE (WSC), 2020, : 2042 - 2048
  • [29] RECORDER, TAPE SELECTION MUST INCLUDE PERFORMANCE-RELIABILITY TRADEOFFS
    DERSCHANG, RC
    CONTROL ENGINEERING, 1976, 23 (06) : 134 - &
  • [30] Design Tradeoffs for SSD Reliability
    Kim, Bryan S.
    Choi, Jongmoo
    Min, Sang Lyul
    PROCEEDINGS OF THE 17TH USENIX CONFERENCE ON FILE AND STORAGE TECHNOLOGIES, 2019, : 281 - 294