Automated Fault-Tolerance Testing

被引:2
|
作者
Nagarajan, Adithya [1 ]
Vaddadi, Ajay [1 ]
机构
[1] Groupon, Chicago, IL 60654 USA
关键词
D O I
10.1109/ICSTW.2016.34
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Software Fault Tolerance is an ability of computer software to continue its normal operation despite the presence of system or hardware faults. Most companies are moving towards a microservices-based architecture where complex applications are developed with a suite of small services, each of which communicates using some common protocols like Hypertext Transfer Protocol (HTTP). While this architecture enables agility in software development and go-to-market, it poses a critical challenge of assessing fault tolerance and resiliency of the overall system. A failure in one of the dependent services can cause an unexpected impact on the upstream services causing severe customer facing issues. Such issues are a result of lack of resiliency in the architecture of the system. There is a need for an automated tool to be able to understand the service architecture, topology, and be able to inject faults to assess fault tolerance and resiliency of the system. In this paper, we present Screwdriver a new automated solution developed at Groupon to address this need.
引用
收藏
页码:275 / 276
页数:2
相关论文
共 50 条
  • [31] Fault-tolerance with multimodule routers
    Chalasani, S
    Boppana, RV
    [J]. SECOND INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, PROCEEDINGS, 1996, : 201 - 210
  • [32] Randomness versus Fault-Tolerance
    Ran Canetti
    Eyal Kushilevitz
    Rafail Ostrovsky
    Adi Rosén
    [J]. Journal of Cryptology, 2000, 13 : 107 - 142
  • [33] ISSUES IN SECURITY AND FAULT-TOLERANCE
    HARTIG, H
    KUHNHAUSER, W
    LIEDTKE, J
    [J]. LECTURE NOTES IN COMPUTER SCIENCE, 1991, 563 : 212 - 216
  • [34] SUBCUBE FAULT-TOLERANCE IN HYPERCUBES
    GRAHAM, N
    HARARY, F
    LIVINGSTON, M
    STOUT, QF
    [J]. INFORMATION AND COMPUTATION, 1993, 102 (02) : 280 - 314
  • [35] Fault-tolerance in biochemical systems
    Winfree, Erik
    [J]. UNCONVENTIONAL COMPUTATION, PROCEEDINGS, 2006, 4135 : 26 - 26
  • [36] Randomness versus fault-tolerance
    Canetti, R
    Kushilevitz, E
    Ostrovsky, R
    Rosén, A
    [J]. JOURNAL OF CRYPTOLOGY, 2000, 13 (01) : 107 - 142
  • [37] A fault-tolerance mechanism in grid
    Jin, L
    Tong, WQ
    Tang, HQ
    Wang, B
    [J]. INDIN 2003: IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS, PROCEEDINGS, 2003, : 457 - 461
  • [38] A unified fault-tolerance protocol
    Miner, P
    Geser, A
    Pike, L
    Maddalon, J
    [J]. FORMAL TECHNIQUES, MODELLING AND ANALYSIS OF TIMED AND FAULT-TOLERANT SYSTEMS, PROCEEDINGS, 2004, 3253 : 167 - 182
  • [39] Framework for testing the fault-tolerance of systems including OS and network aspects
    Buchacker, K
    Sieh, V
    [J]. SIXTH IEEE INTERNATIONAL SYMPOSIUM ON HIGH ASSURANCE SYSTEMS ENGINEERING, 2001, : 95 - 105
  • [40] LAN DISTRIBUTED FAULT-TOLERANCE
    MIROJULIA, J
    [J]. DECENTRALIZED AND DISTRIBUTED SYSTEMS, 1993, 39 : 161 - 174