On the effectiveness of log representation for log-based anomaly detection

被引:0
|
作者
Xingfang Wu
Heng Li
Foutse Khomh
机构
[1] Polytechnique Montreal,Department of Computer Engineering and Software Engineering
来源
关键词
Log representation; Anomaly detection; Automated log analysis;
D O I
暂无
中图分类号
学科分类号
摘要
Logs are an essential source of information for people to understand the running status of a software system. Due to the evolving modern software architecture and maintenance methods, more research efforts have been devoted to automated log analysis. In particular, machine learning (ML) has been widely used in log analysis tasks. In ML-based log analysis tasks, converting textual log data into numerical feature vectors is a critical and indispensable step. However, the impact of using different log representation techniques on the performance of the downstream models is not clear, which limits researchers and practitioners’ opportunities of choosing the optimal log representation techniques in their automated log analysis workflows. Therefore, this work investigates and compares the commonly adopted log representation techniques from previous log analysis research. Particularly, we select six log representation techniques and evaluate them with seven ML models and four public log datasets (i.e., HDFS, BGL, Spirit and Thunderbird) in the context of log-based anomaly detection.We also examine the impacts of the log parsing process and the different feature aggregation approaches when they are employed with log representation techniques. From the experiments, we provide some heuristic guidelines for future researchers and developers to follow when designing an automated log analysis workflow. We believe our comprehensive comparison of log representation techniques can help researchers and practitioners better understand the characteristics of different log representation techniques and provide them with guidance for selecting the most suitable ones for their ML-based log analysis workflow.
引用
收藏
相关论文
共 50 条
  • [1] On the effectiveness of log representation for log-based anomaly detection
    Wu, Xingfang
    Li, Heng
    Khomh, Foutse
    [J]. EMPIRICAL SOFTWARE ENGINEERING, 2023, 28 (06)
  • [2] Log-based Anomaly Detection Without Log Parsing
    Van-Hoang Le
    Zhang, Hongyu
    [J]. 2021 36TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING ASE 2021, 2021, : 492 - 504
  • [3] Leveraging Log Instructions in Log-based Anomaly Detection
    Bogatinovski, Jasmin
    Madjarov, Gjorgji
    Nedelkoski, Sasho
    Cardoso, Jorge
    Kao, Odej
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON SERVICES COMPUTING (IEEE SCC 2022), 2022, : 321 - 326
  • [4] LogEncoder: Log-Based Contrastive Representation Learning for Anomaly Detection
    Qi, Jiaxing
    Luan, Zhongzhi
    Huang, Shaohan
    Fung, Carol
    Yang, Hailong
    Li, Hanlu
    Zhu, Danfeng
    Qian, Depei
    [J]. IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, 2023, 20 (02): : 1378 - 1391
  • [5] Robust Log-Based Anomaly Detection on Unstable Log Data
    Zhang, Xu
    Xu, Yong
    Lin, Qingwei
    Qiao, Bo
    Zhang, Hongyu
    Dang, Yingnong
    Xie, Chunyu
    Yang, Xinsheng
    Cheng, Qian
    Li, Ze
    Chen, Junjie
    He, Xiaoting
    Yao, Randolph
    Lou, Jian-Guang
    Chintalapati, Murali
    Shen, Furao
    Zhang, Dongmei
    [J]. ESEC/FSE'2019: PROCEEDINGS OF THE 2019 27TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, 2019, : 807 - 817
  • [6] Review on Log-Based Anomaly Detection Techniques
    Raut, Pooja
    Mishra, Akanksha
    Rao, Shreya
    Kawoor, Saloni
    Shelke, Sushila
    Deore, Mahendra
    Kumar, Vivek
    [J]. PROCEEDINGS OF SECOND INTERNATIONAL CONFERENCE ON SUSTAINABLE EXPERT SYSTEMS (ICSES 2021), 2022, 351 : 893 - 906
  • [7] An empirical study of the impact of log parsers on the performance of log-based anomaly detection
    Fu, Ying
    Yan, Meng
    Xu, Zhou
    Xia, Xin
    Zhang, Xiaohong
    Yang, Dan
    [J]. EMPIRICAL SOFTWARE ENGINEERING, 2023, 28 (01)
  • [8] An empirical study of the impact of log parsers on the performance of log-based anomaly detection
    Ying Fu
    Meng Yan
    Zhou Xu
    Xin Xia
    Xiaohong Zhang
    Dan Yang
    [J]. Empirical Software Engineering, 2023, 28
  • [9] Transfer Log-based Anomaly Detection with Pseudo Labels
    Huang, Shaohan
    Liu, Yi
    Fung, Carol
    He, Rong
    Zhao, Yining
    Yang, Hailong
    Luan, Zhongzhi
    [J]. 2020 16TH INTERNATIONAL CONFERENCE ON NETWORK AND SERVICE MANAGEMENT (CNSM), 2020,
  • [10] An unsupervised heterogeneous log-based framework for anomaly detection
    Hajamydeen, Asif Iqbal
    Udzir, Nur Izura
    Mahmod, Ramlan
    Abdul Ghani, Abdul Azim
    [J]. TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2016, 24 (03) : 1117 - 1134