Improving the reliability of commodity operating systems

被引:76
|
作者
Swift, MM [1 ]
Bershad, BN [1 ]
Levy, HM [1 ]
机构
[1] Univ Washington, Dept Comp Sci & Engn, Seattle, WA 98195 USA
来源
ACM TRANSACTIONS ON COMPUTER SYSTEMS | 2005年 / 23卷 / 01期
关键词
reliability; management; recovery; device drivers; virtual memory; protection; I/O;
D O I
10.1145/1047915.1047919
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Despite decades of research in extensible operating system technology, extensions such as device drivers remain a significant cause of system failures. In Windows XP, for example, drivers account for 85% of recently reported failures. This article describes Nooks, a reliability subsystem that seeks to greatly enhance operating system (OS) reliability by isolating the OS from driver failures. The Nooks approach is practical: rather than guaranteeing complete fault tolerance through a new (and incompatible) OS or driver architecture, our goal is to prevent the vast majority of driver-caused crashes with little or no change to the existing driver and system code. Nooks isolates drivers within lightweight protection domains inside the kernel address space, where hardware and software prevent them from corrupting the kernel. Nooks also tracks a driver's use of kernel resources to facilitate automatic cleanup during recovery. To prove the viability of our approach, we implemented Nooks in the Linux operating system and used it to fault-isolate several device drivers. Our results show that Nooks offers a substantial increase in the reliability of operating systems, catching and quickly recovering from many faults that would otherwise crash the system. Under a wide range and number of fault conditions, we show that Nooks recovers automatically from 99% of the faults that otherwise cause Linux to crash. While Nooks was designed for drivers, our techniques generalize to other kernel extensions. We demonstrate this by isolating a kernel-mode file system and an in-kernel Internet service. Overall, because Nooks supports existing C-language extensions, runs on a commodity operating system and hardware, and enables automated recovery, it represents a substantial step beyond the specialized architectures and type-safe languages required by previous efforts directed at safe extensibility.
引用
收藏
页码:77 / 110
页数:34
相关论文
共 50 条
  • [1] Intrusion Survivability for Commodity Operating Systems
    Chevalier, Ronny
    Plaquin, David
    Dalton, Chris
    Hiet, Guillaume
    [J]. DIGITAL THREATS: RESEARCH AND PRACTICE, 2020, 1 (04):
  • [2] SoK: Rowhammer on Commodity Operating Systems
    Zhang, Zhi
    Chen, Decheng
    Qi, Jiahao
    Cheng, Yueqiang
    Jiang, Shijie
    Lin, Yiyang
    Gao, Yansong
    Nepal, Surya
    Zou, Yi
    Zhang, Jiliang
    Xiang, Yang
    [J]. PROCEEDINGS OF THE 19TH ACM ASIA CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, ACM ASIACCS 2024, 2024, : 436 - 452
  • [3] SLIC: An extensibility system for commodity operating systems
    Ghormley, DP
    Petrou, D
    Anderson, TE
    Rodrigues, SH
    [J]. PROCEEDINGS OF THE USENIX 1998 ANNUAL TECHNICAL CONFERENCE, 1998, : 39 - 52
  • [4] Improving the Reliability of the Operating System Inside a VM
    Zheng Hao
    Dong Xiaoshe
    Zhu Zhengdong
    Chen Baoke
    Bai Xiuxiu
    Zhang Xingjun
    Wang Endong
    [J]. COMPUTER JOURNAL, 2016, 59 (05): : 715 - 740
  • [5] HPMMAP: Lightweight Memory Management for Commodity Operating Systems
    Kocoloski, Brian
    Lange, John
    [J]. 2014 IEEE 28TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM, 2014,
  • [6] Disco: Running commodity operating systems on scalable multiprocessors
    Bugnion, E
    Devine, S
    Govil, K
    Rosenblum, M
    [J]. ACM TRANSACTIONS ON COMPUTER SYSTEMS, 1997, 15 (04): : 412 - 447
  • [7] CSR: Core Surprise Removal in Commodity Operating Systems
    Shalev, Noam
    Harpaz, Eran
    Porat, Hagar
    Keidar, Idit
    Weinsberg, Yaron
    [J]. ACM SIGPLAN NOTICES, 2016, 51 (04) : 773 - 787
  • [8] On Improving Write Throughput in Commodity MID Systems
    Liu, Jia
    Chen, Xingyu
    Liu, Xiulong
    Zhang, Xiaocong
    Wang, Xia
    Chen, Lijun
    [J]. IEEE CONFERENCE ON COMPUTER COMMUNICATIONS (IEEE INFOCOM 2019), 2019, : 1522 - 1530
  • [9] RELIABILITY PREDICTION FOR CONTINUOUSLY OPERATING SYSTEMS
    PLOTKIN, M
    EINHORN, S
    [J]. IEEE TRANSACTIONS ON RELIABILITY, 1965, R 14 (01) : 15 - &
  • [10] Static analysis based invariant detection for commodity operating systems
    Zhu, Feng
    Wei, Jinpeng
    [J]. COMPUTERS & SECURITY, 2014, 43 : 49 - 63