Live Debugging of Distributed Systems

被引:0
|
作者
Dao, Darren [1 ]
Albrecht, Jeannie [2 ]
Killian, Charles [3 ]
Vahdat, Amin [1 ]
机构
[1] Univ Calif San Diego, La Jolla, CA 92093 USA
[2] Williams Coll, Williamstown, MA 01267 USA
[3] Purdue Univ, W Lafayette, IN 47906 USA
来源
关键词
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Debugging distributed systems is challenging. Although incremental debugging during development finds some bugs, developers are rarely able to fully test their systems under realistic operating conditions prior to deployment. While deploying a system exposes it to realistic conditions, debugging requires the developer to: (i) detect a bug, (ii) gather the system state necessary for diagnosis, and (iii) sift through the gathered state to determine a root cause. In this paper, we present MaceODB, a tool to assist programmers with debugging deployed distributed systems. Programmers define a set of runtime properties for their system, which MaceODB checks for violations during execution. Once MaceODB detects a violation, it provides the programmer with the information to determine its root cause. We have been able to diagnose several non-trivial bugs in existing mature distributed systems using MaceODB; we discuss two of these bugs in this paper. Benchmarks indicate that the approach has low overhead and is suitable for in situ debugging of deployed systems.
引用
收藏
页码:94 / +
页数:2
相关论文
共 50 条
  • [21] Distributed watchpoints: Debugging large modular robot systems
    De Rosa, Michael
    Goldstein, Seth
    Lee, Peter
    Campbell, Jason
    Pillai, Padmanabhan
    [J]. INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2008, 27 (3-4): : 315 - 329
  • [22] Real-Time Enabled Debugging for Distributed Systems
    Gaderer, Georg
    Loschmidt, Patrick
    Sauter, Thilo
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON EMERGING TECHNOLOGIES AND FACTORY AUTOMATION, PROCEEDINGS, 2008, : 472 - 475
  • [23] Auditability: An Approach to Ease Debugging of Reliable Distributed Systems
    Alhajaili, Sara
    Jhumka, Arshad
    [J]. 2019 IEEE 24TH PACIFIC RIM INTERNATIONAL SYMPOSIUM ON DEPENDABLE COMPUTING (PRDC 2019), 2019, : 227 - 235
  • [24] Event Chain Clocks for performance debugging in parallel and distributed systems
    Yu, HL
    Liu, J
    Zheng, WM
    Shen, MM
    [J]. PARALLEL AND DISTRIBUTED PROCESSING AND APPLICATIONS, PROCEEDINGS, 2004, 3358 : 1050 - 1054
  • [25] Using comprehensive analysis for performance debugging in distributed storage systems
    Leung, Andrew W.
    Lalonde, Eric
    Telleen, Jacob
    Davis, James
    Maltzahn, Carlos
    [J]. 24TH IEEE CONFERENCE ON MASS STORAGE SYSTEMS AND TECHNOLOGIES, PROCEEDINGS, 2007, : 281 - 286
  • [26] Monitoring and Debugging Distributed Autonomous Systems using Petri Nets
    Lopez, Joaquin
    Perez, Diego
    Gayoso, Miguel
    Paz, Enrique
    [J]. WMSCI 2011: 15TH WORLD MULTI-CONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL I, 2011, : 216 - 221
  • [27] DebugAR: Mixed Dimensional Displays for Immersive Debugging of Distributed Systems
    Reipschlaeger, Patrick
    Ozkan, Burcu Kulahcioglu
    Mathur, Aman Shankar
    Gumhold, Stefan
    Majumdar, Rupak
    Dachselt, Raimund
    [J]. CHI 2018: EXTENDED ABSTRACTS OF THE 2018 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, 2018,
  • [28] Special issue: Parallel and Distributed Systems: Testing and Debugging (PADTAD)
    Ur, Shmuel
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2007, 19 (03): : 265 - 266
  • [29] Distributed watchpoints: Debugging large multi-robot systems
    De Rosa, Michael
    Campbell, Jason
    Pillai, Padmanabhan
    Goldstein, Seth
    Lee, Peter
    Mowry, Todd
    [J]. PROCEEDINGS OF THE 2007 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, VOLS 1-10, 2007, : 3723 - +
  • [30] Debugging Distributed Systems with Why-Across-Time Provenance
    Whittaker, Michael
    Teodoropol, Cristina
    Alvaro, Peter
    Hellerstein, Joseph M.
    [J]. PROCEEDINGS OF THE 2018 ACM SYMPOSIUM ON CLOUD COMPUTING (SOCC '18), 2018, : 333 - 346