Lifeguard: Local Health Awareness for More Accurate Failure Detection

被引:4
|
作者
Dadgar, Armon [1 ]
Phillips, James [1 ]
Currey, Jon [1 ]
机构
[1] HashiCorp Inc, San Francisco, CA 94105 USA
关键词
D O I
10.1109/DSN-W.2018.00017
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
SWIM is a peer-to-peer group membership protocol, with attractive scaling and robustness properties. However, our experience supporting an implementation of SWIM shows that a high rate of false positive failure detections (healthy members being marked as failed) is possible in certain real world scenarios, and that this is due to SWIM's sensitivity to slow message processing. To address this we propose a set of extensions to SWIM (together called Lifeguard), which employ heuristic measures of a failure detector's local health. In controlled tests, Lifeguard is able to reduce the false positive rate by more than 50x. Real world deployment of the extensions has significantly reduced support requests and observed instability. The need for this work points to the fail-stop failure model being overly simplistic for large datacenters, where the likelihood of some nodes experiencing transient CPU starvation, IO flakiness, random packet loss, or other non-crash problems becomes high. With increasing attention being given to these gray failures, we believe the local health abstraction may be applicable in a broad range of settings, including other kinds of distributed failure detectors.
引用
收藏
页码:22 / 25
页数:4
相关论文
共 50 条
  • [31] Accurate registration and failure detection in tissue micro array images
    Bello, Musodiq
    Can, Ali
    Tao, Xiaodong
    [J]. 2008 IEEE INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING: FROM NANO TO MACRO, VOLS 1-4, 2008, : 368 - 371
  • [32] 12.10: Two New Indices for A More Accurate Assessment of the Local Aortic Stiffness
    M. Collette
    A. Humeau
    A. Lalande
    E. Guerreschi
    S. Willoteaux
    G. Leftheriotis
    [J]. Artery Research, 2011, 5 (4) : 203 - 203
  • [33] Local councils and health - Public health must be taken more seriously
    Kinshuck, David J.
    [J]. BRITISH MEDICAL JOURNAL, 2007, 334 (7601): : 967 - 967
  • [34] Mode of death in chronic heart failure - A request and proposition for more accurate classification
    Narang, R
    Cleland, JGF
    Erhardt, L
    Ball, SG
    Coats, AJS
    Cowley, AJ
    Dargie, HJ
    Hall, AS
    Hampton, JR
    PooleWilson, PA
    [J]. EUROPEAN HEART JOURNAL, 1996, 17 (09) : 1390 - 1403
  • [35] Liver cirrhosis/liver failure: more accurate predictions through metabolomic models?
    Simon, Annika
    [J]. ZEITSCHRIFT FUR GASTROENTEROLOGIE, 2024, 62 (01): : 22and24
  • [36] Toward More Accurate Detection and Risk Stratification of Chronic Kidney Disease
    Kalantar-Zadeh, Kamyar
    Amin, Alpesh N.
    [J]. JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2012, 307 (18): : 1976 - 1977
  • [37] IMASS: EVOLVED NRF SIMULATIONS FOR MORE ACCURATE DETECTION OF NUCLEAR THREATS
    Perry, John
    Xiao, Shanjie
    Jevremovic, Tatjana
    [J]. PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON NUCLEAR ENGINEERING, VOL 2, 2009, : 915 - 921
  • [38] Learning to predict more accurate text instances for scene text detection
    Li, Xiaoqian
    Liu, Jie
    Zhang, Guixuan
    Huang, Ying
    Zheng, Yang
    Zhang, Shuwu
    [J]. NEUROCOMPUTING, 2021, 449 : 455 - 463
  • [39] GSD-YOLOX: Lightweight and more accurate object detection models
    Gao, Xinghua
    Yu, Anning
    Tan, Jia
    Gao, Xingzhong
    Zeng, Xiaoping
    Wu, Chen
    [J]. JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2024, 98
  • [40] Multiple Classifier Systems for More Accurate Java']JavaScript Malware Detection
    Yi, Zibo
    Ma, Jun
    Luo, Lei
    Yu, Jie
    Wu, Qingbo
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON PROMOTION OF INFORMATION TECHNOLOGY (ICPIT 2016), 2016, 66 : 139 - 143