Improving the reliability of commodity operating systems

被引:76
|
作者
Swift, MM [1 ]
Bershad, BN [1 ]
Levy, HM [1 ]
机构
[1] Univ Washington, Dept Comp Sci & Engn, Seattle, WA 98195 USA
来源
ACM TRANSACTIONS ON COMPUTER SYSTEMS | 2005年 / 23卷 / 01期
关键词
reliability; management; recovery; device drivers; virtual memory; protection; I/O;
D O I
10.1145/1047915.1047919
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Despite decades of research in extensible operating system technology, extensions such as device drivers remain a significant cause of system failures. In Windows XP, for example, drivers account for 85% of recently reported failures. This article describes Nooks, a reliability subsystem that seeks to greatly enhance operating system (OS) reliability by isolating the OS from driver failures. The Nooks approach is practical: rather than guaranteeing complete fault tolerance through a new (and incompatible) OS or driver architecture, our goal is to prevent the vast majority of driver-caused crashes with little or no change to the existing driver and system code. Nooks isolates drivers within lightweight protection domains inside the kernel address space, where hardware and software prevent them from corrupting the kernel. Nooks also tracks a driver's use of kernel resources to facilitate automatic cleanup during recovery. To prove the viability of our approach, we implemented Nooks in the Linux operating system and used it to fault-isolate several device drivers. Our results show that Nooks offers a substantial increase in the reliability of operating systems, catching and quickly recovering from many faults that would otherwise crash the system. Under a wide range and number of fault conditions, we show that Nooks recovers automatically from 99% of the faults that otherwise cause Linux to crash. While Nooks was designed for drivers, our techniques generalize to other kernel extensions. We demonstrate this by isolating a kernel-mode file system and an in-kernel Internet service. Overall, because Nooks supports existing C-language extensions, runs on a commodity operating system and hardware, and enables automated recovery, it represents a substantial step beyond the specialized architectures and type-safe languages required by previous efforts directed at safe extensibility.
引用
收藏
页码:77 / 110
页数:34
相关论文
共 50 条
  • [41] Improving the reliability of hydraulic systems of technological machines
    Pugin, K. G.
    [J]. INTERNATIONAL CONFERENCE ON MODERN TRENDS IN MANUFACTURING TECHNOLOGIES AND EQUIPMENT (ICMTMTE) 2020, 2020, 971
  • [42] Network reconfiguration for improving reliability in distribution systems
    Brown, RE
    [J]. 2003 IEEE POWER ENGINEERING SOCIETY GENERAL MEETING, VOLS 1-4, CONFERENCE PROCEEDINGS, 2003, : 2419 - 2424
  • [43] IMPROVING THE RELIABILITY OF DRAINAGE SYSTEMS AT KIMBERLITE MINES
    Ovchinnikov, N. P.
    [J]. PROCEEDINGS OF THE TULA STATES UNIVERSITY-SCIENCES OF EARTH, 2024, 1 : 300 - 312
  • [44] Improving the Reliability of Embedded Systems with Cache and SPM
    Wang, Meng
    Wang, Yi
    Liu, Duo
    Shao, Zili
    [J]. 2009 IEEE 6TH INTERNATIONAL CONFERENCE ON MOBILE ADHOC AND SENSOR SYSTEMS (MASS 2009), 2009, : 1066 - 1071
  • [45] IMPROVING PERFORMANCE AND RELIABILITY ASSESSMENTS OF AVIONICS SYSTEMS
    Marwedel, Stephan
    Fischer, Nils
    Salzwedel, Horst
    [J]. 2011 IEEE/AIAA 30TH DIGITAL AVIONICS SYSTEMS CONFERENCE (DASC), 2011,
  • [46] Improving reliability and operational availability of military systems
    Macheret, Yevgeny
    Koehn, Phillip
    Sparrow, David
    [J]. 2005 IEEE AEROSPACE CONFERENCE, VOLS 1-4, 2005, : 3948 - 3957
  • [47] Improving the data cache performance of multiprocessor operating systems
    Xia, C
    Torrellas, J
    [J]. SECOND INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, PROCEEDINGS, 1996, : 85 - 94
  • [48] CALCULATION OF RELIABILITY VALUES FOR INERTIAL NAVIGATION SYSTEMS AND POSSIBILITIES OF IMPROVING RELIABILITY
    HEUSMANN, H
    [J]. ZEITSCHRIFT FUR FLUGWISSENSCHAFTEN, 1969, 17 (11): : 412 - &
  • [49] LOT-ECC: LOcalized and Tiered Reliability Mechanisms for Commodity Memory Systems
    Udipi, Aniruddha N.
    Muralimanohar, Naveen
    Balsubramonian, Rajeev
    Davis, Al
    Jouppi, Norman P.
    [J]. 2012 39TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2012, : 285 - 296
  • [50] Survivor: A Fine-Grained Intrusion Response and Recovery Approach for Commodity Operating Systems
    Chevalier, Ronny
    Plaquin, David
    Dalton, Chris
    Hiet, Guillaume
    [J]. 35TH ANNUAL COMPUTER SECURITY APPLICATIONS CONFERENCE (ACSA), 2019, : 762 - 775