ImpactTracer: Root Cause Localization in Microservices Based on Fault Propagation Modeling

被引:0
|
作者
Xie, Ru [1 ,2 ]
Yang, Jing [1 ]
Li, Jingying [1 ,2 ]
Wang, Liming [1 ,2 ]
机构
[1] Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Sch Cyber Secur, Beijing, Peoples R China
关键词
microservice; cloud-native; dependability; fault modeling; root cause identification;
D O I
10.23919/DATE56975.2023.10137078
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Microservice architecture is embraced by a growing number of enterprises due to the benefits of modularity and flexibility. However, being composed of numerous interdependent microservices, it is prone to cascading failures and afflicted by the arising problem of troubleshooting, which entails arduous efforts to identify the root cause node and ensure service availability. Previous works use call graph to characterize causality relationships of microservices but not completely or comprehensively, leading to an insufficient search of potential root cause nodes and consequently poor accuracy in culprit localization. In this paper, we propose ImpactTracer to address the above problems. ImpactTracer builds impact graph to provide a complete view of fault propagation in microservices and uses a novel backward tracing algorithm that exhaustively traverses the impact graph to identify the root cause node accurately. Extensive experiments on a real-world dataset demonstrate that ImpactTracer is effective in identifying the root cause node and outperforms the state-of-the-art methods by at least 72%, significantly facilitating troubleshooting in microservices.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] MicroRCA: Root Cause Localization of Performance Issues in Microservices
    Wu, Li
    Tordsson, Johan
    Elmroth, Erik
    Kao, Odej
    [J]. NOMS 2020 - PROCEEDINGS OF THE 2020 IEEE/IFIP NETWORK OPERATIONS AND MANAGEMENT SYMPOSIUM 2020: MANAGEMENT IN THE AGE OF SOFTWARIZATION AND ARTIFICIAL INTELLIGENCE, 2020,
  • [2] Root cause localization for wind turbines using physics guided multivariate graphical modeling and fault propagation analysis
    Feng, Chenlong
    Liu, Chao
    Jiang, Dongxiang
    [J]. KNOWLEDGE-BASED SYSTEMS, 2024, 295
  • [3] Multi -Layer Observability for Fault Localization in Microservices Based Systems
    Rangaiyengar, Rupashree
    Komondoor, Raghavan
    Medicherla, Raveendra Kumar
    [J]. 2023 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION AND REENGINEERING, SANER, 2023, : 733 - 737
  • [4] A layered framework for root cause diagnosis of microservices
    Bento, Andre
    Correia, Jaime
    Duraes, Joao
    Soares, Joao
    Ribeiro, Luis
    Ferreira, Antonio
    Carreira, Rita
    Araujo, Filipe
    Barbosa, Raul
    [J]. 2021 IEEE 20TH INTERNATIONAL SYMPOSIUM ON NETWORK COMPUTING AND APPLICATIONS (NCA), 2021,
  • [5] ModelCoder: A Fault Model based Automatic Root Cause Localization Framework for Microservice Systems
    Cai, Yang
    Han, Biao
    Li, Jie
    Zhao, Na
    Su, Jinshu
    [J]. 2021 IEEE/ACM 29TH INTERNATIONAL SYMPOSIUM ON QUALITY OF SERVICE (IWQOS), 2021,
  • [6] Failure Root Cause Analysis for Microservices, Explained
    Soldani, Jacopo
    Forti, Stefano
    Brogi, Antonio
    [J]. DISTRIBUTED APPLICATIONS AND INTEROPERABLE SYSTEMS (DAIS 2022), 2022, 13272 : 74 - 91
  • [7] A fault localization approach based on fault propagation context
    Yan, Yue
    Jiang, Shujuan
    Zhang, Yanmei
    Zhang, Shenggang
    Zhang, Cheng
    [J]. INFORMATION AND SOFTWARE TECHNOLOGY, 2023, 160
  • [8] Fault propagation behavior study and root cause reasoning with dynamic Bayesian network based framework
    Hu, Jinqiu
    Zhang, Laibin
    Cai, Zhansheng
    Wang, Yu
    Wang, Anqi
    [J]. PROCESS SAFETY AND ENVIRONMENTAL PROTECTION, 2015, 97 : 25 - 36
  • [9] Alarm-Based Root Cause Analysis Based on Weighted Fault Propagation Topology for Distributed Information Network
    LYU Xiaomeng
    CHEN Hao
    WU Zhenyu
    HAN Junhua
    GUO Huifeng
    [J]. ZTE Communications, 2022, 20 (03) : 77 - 84
  • [10] Root cause diagnosis and fault propagation path identification for complex industrial processes based on data space
    Qiao, Liang
    Li, Xueting
    Wang, Xing
    Peng, Kaixiang
    [J]. MEASUREMENT, 2024, 226