PPU: A Control Error-Tolerant Processor for Streaming Applications with Formal Guarantees

被引:0
|
作者
Golnari, Pareesa Ameneh [1 ]
Yetim, Yavuz [2 ,4 ]
Martonosi, Margaret [3 ]
Vizel, Yakir [1 ]
Malik, Sharad [1 ]
机构
[1] Princeton Univ, Dept Elect Engn, Princeton, NJ 08544 USA
[2] Princeton Univ, Princeton, NJ 08544 USA
[3] Princeton Univ, Dept Comp Sci, Princeton, NJ 08544 USA
[4] Google, 345 Spear St, San Francisco, CA 94105 USA
基金
美国国家科学基金会;
关键词
Error-tolerant computing; streaming applications; reliability requirements; progress; control flow; verification; APPROXIMATE; RELIABILITY; POWER; SAFE;
D O I
10.1145/2990502
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
With increasing technology scaling and design complexity there are increasing threats from device and circuit failures. This is expected to worsen with post-CMOS devices. Current error-resilient solutions ensure reliability of circuits through protection mechanisms such as redundancy, error correction, and recovery. However, the costs of these solutions may be high, rendering them impractical. In contrast, error-tolerant solutions allow errors in the computation and are positioned to be suitable for error-tolerant applications such as media applications. For such programmable error-tolerant processors, the Instruction-Set-Architecture (ISA) no longer serves as a specification since it is acceptable for the processor to allow for errors during the execution of instructions. In this work, we address this specification gap by defining the basic requirements needed for an error-tolerant processor to provide acceptable results. Furthermore, we formally define properties that capture these requirements. Based on this, we propose the Partially Protected Uniprocessor (PPU), an error-tolerant processor that aims to meet these requirements with low-cost microarchitectural support. These protection mechanisms convert potentially fatal control errors to potentially tolerable data errors instead of ensuring instruction-level or byte-level correctness. The protection mechanisms in PPU protect the system against crashes, unresponsiveness, and external device corruption. In addition, they also provide support for achieving acceptable result quality. Additionally, we provide a methodology that formally proves the specification properties on PPU using model checking. This methodology uses models for the hardware and software that are integrated with the fault and recovery models. Finally, we experimentally demonstrate the results of model checking and the application-level quality of results for PPU.
引用
收藏
页数:29
相关论文
共 49 条
  • [11] Ghostwriter: A Cache Coherence Protocol for Error-Tolerant Applications
    Kao, Henry
    San Miguel, Joshua
    Jerger, Natalie Enright
    50TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING WORKSHOP PROCEEDINGS - ICPP WORKSHOPS '21, 2021,
  • [12] Branch and Data Herding: Reducing Control and Memory Divergence for Error-Tolerant GPU Applications
    Sartori, John
    Kumar, Rakesh
    IEEE TRANSACTIONS ON MULTIMEDIA, 2013, 15 (02) : 279 - 290
  • [13] Branch and Data Herding: Reducing Control and Memory Divergence for Error-tolerant GPU Applications
    Sartori, John
    Kumar, Rakesh
    PROCEEDINGS OF THE 21ST INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT'12), 2012, : 427 - 430
  • [14] Recovery-Driven Design: A Power Minimization Methodology for Error-Tolerant Processor Modules
    Kahng, Andrew B.
    Kang, Seokhyeong
    Kumar, Rakesh
    Sartori, John
    PROCEEDINGS OF THE 47TH DESIGN AUTOMATION CONFERENCE, 2010, : 825 - 830
  • [15] Efficient ATFA design based on CNTFET technology for error-tolerant applications
    Rad, Rabe'e Sharifi
    Ghanatghestani, Mokhtar Mohammadi
    Hashemipour, Malihe
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2024, 43 (02) : 1119 - 1143
  • [16] Parsimonious Circuits for Error-Tolerant Applications through Probabilistic Logic Minimization
    Lingamneni, Avinash
    Enz, Christian
    Palem, Krishna
    Piguet, Christian
    INTEGRATED CIRCUIT AND SYSTEM DESIGN: POWER AND TIMING MODELING, OPTIMIZATION, AND SIMULATION, 2011, 6951 : 204 - +
  • [17] RESAC: A redundancy strategy involving approximate computing for error-tolerant applications
    Balasubramanian, Padmanabhan
    Maskell, Douglas L.
    Prasad, Krishnamachar
    MICROELECTRONICS RELIABILITY, 2023, 150
  • [18] RISC-V Core with Approximate Multiplier for Error-Tolerant Applications
    Verma, Anu
    Sharma, Priyamvada
    Das, Bishnu Prasad
    2022 25TH EUROMICRO CONFERENCE ON DIGITAL SYSTEM DESIGN (DSD), 2022, : 239 - 246
  • [19] Layer-Sensitive Neural Processing Architecture for Error-Tolerant Applications
    Li, Zeju
    Wang, Qinfan
    Zou, Zihan
    Shen, Qiao
    Xie, Na
    Cai, Hao
    Zhang, Hao
    Liu, Bo
    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2024, 32 (05) : 797 - 809
  • [20] Energy-efficient approximate full adders for error-tolerant applications
    Ahmadi, Farshid
    Semati, Mohammad R.
    Daryanavard, Hassan
    Minaeifar, Atefeh
    COMPUTERS & ELECTRICAL ENGINEERING, 2023, 110