SherLog: Error Diagnosis by Connecting Clues from Run-time Logs

被引:40
|
作者
Yuan, Ding [1 ]
Mai, Haohui [1 ]
Xiong, Weiwei [1 ]
Tan, Lin [2 ]
Zhou, Yuanyuan [3 ]
Pasupathy, Shankar
机构
[1] Univ Illinois, Urbana, IL USA
[2] Univ Waterloo, Waterloo, ON N2L 3G1, Canada
[3] Univ Calif San Diego, La Jolla, CA 92093 USA
关键词
Reliability; Log; Failure Diagnostics; Static Analysis;
D O I
10.1145/1735971.1736038
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Computer systems often fail due to many factors such as software bugs or administrator errors. Diagnosing such production run failures is an important but challenging task since it is difficult to reproduce them in house due to various reasons: (1) unavailability of users' inputs and file content due to privacy concerns; (2) difficulty in building the exact same execution environment; and (3) non-determinism of concurrent executions on multi-processors. Therefore, programmers often have to diagnose a production run failure based on logs collected back from customers and the corresponding source code. Such diagnosis requires expert knowledge and is also too time-consuming, tedious to narrow down root causes. To address this problem, we propose a tool, called SherLog, that analyzes source code by leveraging information provided by run-time logs to infer what must or may have happened during the failed production run. It requires neither re-execution of the program nor knowledge on the log's semantics. It infers both control and data value information regarding to the failed execution. We evaluate SherLog with 8 representative real world software failures (6 software bugs and 2 configuration errors) from 7 applications including 3 servers. Information inferred by SherLog are very useful for programmers to diagnose these evaluated failures. Our results also show that SherLog can analyze large server applications such as Apache with thousands of logging messages within only 40 minutes.
引用
收藏
页码:143 / 154
页数:12
相关论文
共 50 条
  • [1] SherLog: Error Diagnosis by Connecting Clues from Run-time Logs
    Yuan, Ding
    Mai, Haohui
    Xiong, Weiwei
    Tan, Lin
    Zhou, Yuanyuan
    Pasupathy, Shankar
    ASPLOS XV: FIFTEENTH INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS, 2010, : 143 - 154
  • [2] SherLog: Error diagnosis by connecting clues from run-time logs
    Yuan, Ding
    Mai, Haohui
    Xiong, Weiwei
    Tan, Lin
    Zhou, Yuanyuan
    Pasupathy, Shankar
    ACM SIGPLAN Notices, 2010, 45 (03): : 143 - 154
  • [3] The Importance of Run-Time Error Detection
    Luecke, Glenn R.
    Coyle, James
    Hoekstra, James
    Kraeva, Marina
    Xu, Ying
    Park, Mi-Young
    Kleiman, Elizabeth
    Weiss, Olga
    Wehe, Andre
    Yahya, Melissa
    TOOLS FOR HIGH PERFORMANCE COMPUTING 2009, 2010, : 145 - 155
  • [4] AN EFFICIENT RUN-TIME ROUTER FOR CONNECTING MODULES IN FPGAS
    Suris, Jorge
    Patterson, Cameron
    Athanas, Peter
    2008 INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE AND LOGIC APPLICATIONS, VOLS 1 AND 2, 2008, : 125 - 130
  • [5] RUN-TIME ERROR CHECKING COMES TO COMPILERS
    APIKI, S
    BYTE, 1995, 20 (10): : 34 - 34
  • [6] Astree: Verification of absence of run-time error
    Mauborgne, L
    BUILDING THE INFORMATION SOCIETY, 2004, 156 : 385 - 392
  • [7] Model-based run-time error detection
    Hooman, Jozef
    Hendriks, Teun
    MODELS IN SOFTWARE ENGINEERING, 2008, 5002 : 225 - 236
  • [8] Finding and preventing run-time error handling mistakes
    Weimer, W
    Necula, GC
    ACM SIGPLAN NOTICES, 2004, 39 (10) : 419 - 431
  • [9] Plug-in technology connecting Simulink model with run-time infrastructure
    Hu T.
    Jin X.
    Shen L.
    Liu L.
    Shen, Liqun (shenliqun@hit.edu.cn), 1600, CIMS (27): : 1422 - 1428
  • [10] Architecture-Based Run-Time Fault Diagnosis
    Casanova, Paulo
    Schmerl, Bradley
    Garlan, David
    Abreu, Rui
    SOFTWARE ARCHITECTURE, 2011, 6903 : 261 - +