LogRep: Log-based Anomaly Detection by Representing both Semantic and Numeric Information in Raw Messages

被引:0
|
作者
Xie, Xiaoda [1 ]
Jiang, Songlei [1 ]
Huang, Chenlin [1 ]
Yu, Fengyuan [1 ]
Deng, Yunjia [2 ]
机构
[1] Natl Univ Def Technol, Coll Comp, Changsha, Peoples R China
[2] China United Network Commun Co Ltd, Hunan Branch, Changsha, Peoples R China
基金
中国国家自然科学基金;
关键词
Log-based anomaly detection; Log representation learning; Limited training data; Log heterogeneity; Log data analysis; LARGE-SCALE;
D O I
10.1109/ISSRE59848.2023.00015
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Log-based anomaly detection plays an essential role in various system reliability-related fields including software reliability, network reliability, and so on. System log data is a kind of semi-structured heterogeneous data that contains both semantic parts and numeric variables which both reflect the abnormal behavior of the system. However, existing log-based anomaly detection methods fail to capture the numeric information in raw data which makes them degrade a lot when only limited labeled data is available. To comprehensively capture the semantic and numeric information to enhance anomaly detection, we propose LogRep, a novel representation-based log anomaly detection method that captures both semantic and numeric information in the learned representations. The newly proposed position-aware numeric representation learning module and the attention-based representation fusion module in LogRep solve the heterogeneity problem well in log data. Due to the high quality of learned log representation, LogRep can achieve a comparable anomaly detection performance with SOTA methods while the training data used in LogRep is two orders of magnitude less than that used in SOTA methods. When reducing the training data scale, the performance of SOTA methods drops a lot, while LogRep keeps a stable good performance on two public HDFS dataset, BGL dataset, and one self-collected dataset. Specifically, LogRep achieves the 10.6% and 5.8% improvements over the second-best method in terms of F1 score on the BGL and HDFS datasets when only 1% training data are available respectively.
引用
收藏
页码:195 / 206
页数:12
相关论文
共 50 条
  • [1] Log-based Anomaly Detection Without Log Parsing
    Van-Hoang Le
    Zhang, Hongyu
    [J]. 2021 36TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING ASE 2021, 2021, : 492 - 504
  • [2] Leveraging Log Instructions in Log-based Anomaly Detection
    Bogatinovski, Jasmin
    Madjarov, Gjorgji
    Nedelkoski, Sasho
    Cardoso, Jorge
    Kao, Odej
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON SERVICES COMPUTING (IEEE SCC 2022), 2022, : 321 - 326
  • [3] On the effectiveness of log representation for log-based anomaly detection
    Xingfang Wu
    Heng Li
    Foutse Khomh
    [J]. Empirical Software Engineering, 2023, 28
  • [4] On the effectiveness of log representation for log-based anomaly detection
    Wu, Xingfang
    Li, Heng
    Khomh, Foutse
    [J]. EMPIRICAL SOFTWARE ENGINEERING, 2023, 28 (06)
  • [5] Review on Log-Based Anomaly Detection Techniques
    Raut, Pooja
    Mishra, Akanksha
    Rao, Shreya
    Kawoor, Saloni
    Shelke, Sushila
    Deore, Mahendra
    Kumar, Vivek
    [J]. PROCEEDINGS OF SECOND INTERNATIONAL CONFERENCE ON SUSTAINABLE EXPERT SYSTEMS (ICSES 2021), 2022, 351 : 893 - 906
  • [6] Robust Log-Based Anomaly Detection on Unstable Log Data
    Zhang, Xu
    Xu, Yong
    Lin, Qingwei
    Qiao, Bo
    Zhang, Hongyu
    Dang, Yingnong
    Xie, Chunyu
    Yang, Xinsheng
    Cheng, Qian
    Li, Ze
    Chen, Junjie
    He, Xiaoting
    Yao, Randolph
    Lou, Jian-Guang
    Chintalapati, Murali
    Shen, Furao
    Zhang, Dongmei
    [J]. ESEC/FSE'2019: PROCEEDINGS OF THE 2019 27TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, 2019, : 807 - 817
  • [7] Transfer Log-based Anomaly Detection with Pseudo Labels
    Huang, Shaohan
    Liu, Yi
    Fung, Carol
    He, Rong
    Zhao, Yining
    Yang, Hailong
    Luan, Zhongzhi
    [J]. 2020 16TH INTERNATIONAL CONFERENCE ON NETWORK AND SERVICE MANAGEMENT (CNSM), 2020,
  • [8] An unsupervised heterogeneous log-based framework for anomaly detection
    Hajamydeen, Asif Iqbal
    Udzir, Nur Izura
    Mahmod, Ramlan
    Abdul Ghani, Abdul Azim
    [J]. TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2016, 24 (03) : 1117 - 1134
  • [9] ASGNet: Adaptive Semantic Gate Networks for Log-Based Anomaly Diagnosis
    Yang, Haitian
    Sun, Degang
    Liu, Wen
    Li, Yanshu
    Wang, Yan
    Huang, Weiqing
    [J]. NEURAL INFORMATION PROCESSING, ICONIP 2023, PT IV, 2024, 14450 : 200 - 212
  • [10] DSGN: Log-based anomaly diagnosis with dynamic semantic gate networks
    Yang, Haitian
    Sun, Degang
    Wang, Yan
    Huang, Weiqing
    [J]. INFORMATION SCIENCES, 2024, 680