Towards Automated Log Parsing for Large-Scale Log Data Analysis

被引:111
|
作者
He, Pinjia [1 ,2 ,3 ]
Zhu, Jieming [4 ]
He, Shilin [1 ,2 ,3 ]
Li, Jian [1 ,2 ,3 ]
Lyu, Michael R. [1 ,2 ,3 ]
机构
[1] Chinese Univ Hong Kong, Shenzhen Res Inst, Shenzhen, Peoples R China
[2] Chinese Univ Hong Kong, Dept Comp Sci & Engn, Shatin, Hong Kong, Peoples R China
[3] CUHK Sub Lab, MoE Key Lab High Confidence Software Technol, Shatin, Hong Kong, Peoples R China
[4] Huawei, Huawei Labs 2012, Shenzhen 518129, Peoples R China
基金
中国国家自然科学基金;
关键词
System management; log parsing; log analysis; parallel computing; clustering; CLOUD;
D O I
10.1109/TDSC.2017.2762673
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Logs are widely used in system management for dependability assurance because they are often the only data available that record detailed system runtime behaviors in production. Because the size of logs is constantly increasing, developers (and operators) intend to automate their analysis by applying data mining methods, therefore structured input data (e.g., matrices) are required. This triggers a number of studies on log parsing that aims to transform free-text log messages into structured events. However, due to the lack of open-source implementations of these log parsers and benchmarks for performance comparison, developers are unlikely to be aware of the effectiveness of existing log parsers and their limitations when applying them into practice. They must often reimplement or redesign one, which is time-consuming and redundant. In this paper, we first present a characterization study of the current state of the art log parsers and evaluate their efficacy on five real-world datasets with over ten million log messages. We determine that, although the overall accuracy of these parsers is high, they are not robust across all datasets. When logs grow to a large scale (e.g., 200 million log messages), which is common in practice, these parsers are not efficient enough to handle such data on a single computer. To address the above limitations, we design and implement a parallel log parser (namely POP) on top of Spark, a large-scale data processing platform. Comprehensive experiments have been conducted to evaluate POP on both synthetic and real-world datasets. The evaluation results demonstrate the capability of POP in terms of accuracy, efficiency, and effectiveness on subsequent log mining tasks.
引用
收藏
页码:931 / 944
页数:14
相关论文
共 50 条
  • [1] DLLog: An Online Log Parsing Approach for Large-Scale System
    Cheng, Hailong
    Ying, Shi
    Duan, Xiaoyu
    Yuan, Wanli
    [J]. INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2024, 2024
  • [2] Queries over Large-scale Log Data of Hybrid Granularities
    Zhao, Gansen
    Zhuang, Xutian
    Wang, Xinming
    Nie, Ruihua
    Liao, Zhirui
    Lin, Chengchuang
    Li, Zhenyu
    [J]. 2016 15TH INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED COMPUTING (ISPDC), 2016, : 240 - 246
  • [3] Tools and Benchmarks for Automated Log Parsing
    Zhu, Jieming
    He, Shilin
    Liu, Jinyang
    He, Pinjia
    Xie, Qi
    Zheng, Zibin
    Lyu, Michael R.
    [J]. 2019 IEEE/ACM 41ST INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: SOFTWARE ENGINEERING IN PRACTICE (ICSE-SEIP 2019), 2019, : 121 - 130
  • [4] Towards a Classification of Log Parsing Errors
    Sedki, Issam
    Hamou-Lhadj, Abdelwahab
    Ait-Mohamed, Otmane
    Ezzati-Jivan, Naser
    [J]. 2023 IEEE/ACM 31ST INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION, ICPC, 2023, : 84 - 88
  • [5] Comparing Large-Scale Assessments in Two Proctoring Modalities with Interactive Log Data Analysis
    Shin, Jinnie
    Guo, Qi
    Morins, Maxim
    [J]. EDUCATIONAL MEASUREMENT-ISSUES AND PRACTICE, 2023, 42 (04) : 66 - 80
  • [6] Identifying Mixture Components From Large-Scale Keystroke Log Data
    Li, Tingxuan
    [J]. FRONTIERS IN PSYCHOLOGY, 2021, 12
  • [7] An Effective Approach for Parsing Large Log Files
    Sedki, Issam
    Hamou-Lhadj, Abdelwahab
    Ait-Mohamed, Otmane
    Shehab, Mohammed A.
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME 2022), 2022, : 1 - 12
  • [8] Loginson: a transform and load system for very large-scale log analysis in large IT infrastructures
    Carlos Vega
    Paula Roquero
    Rafael Leira
    Ivan Gonzalez
    Javier Aracil
    [J]. The Journal of Supercomputing, 2017, 73 : 3879 - 3900
  • [9] Loginson: a transform and load system for very large-scale log analysis in large IT infrastructures
    Vega, Carlos
    Roquero, Paula
    Leira, Rafael
    Gonzalez, Ivan
    Aracil, Javier
    [J]. JOURNAL OF SUPERCOMPUTING, 2017, 73 (09): : 3879 - 3900
  • [10] Automatic Parsing and Utilization of System Log Features in Log Analysis: A Survey
    Ma, Junchen
    Liu, Yang
    Wan, Hongjie
    Sun, Guozi
    [J]. APPLIED SCIENCES-BASEL, 2023, 13 (08):