Robust and Transferable Anomaly Detection in Log Data using Pre-Trained Language Models

被引：9

作者：

Ott, Harold ^{[1
]}

Bogatinovski, Jasmin ^{[1
]}

Acker, Alexander ^{[1
]}

Nedelkoski, Sasho ^{[1
]}

Kao, Odej ^{[1
]}

机构：

[1] TU Berlin, Distributed & Operating Syst, Berlin, Germany

来源：

2021 IEEE/ACM INTERNATIONAL WORKSHOP ON CLOUD INTELLIGENCE (CLOUDINTELLIGENCE 2021) | 2021年

关键词：

anomaly detection; log analysis; deep learning; language models; transfer learning;

D O I：

10.1109/CloudIntelligence52565.2021.00013

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Anomalies or failures in large computer systems, such as the cloud, have an impact on a large number of users that communicate, compute, and store information. Therefore, timely and accurate anomaly detection is necessary for reliability, security, safe operation, and mitigation of losses in these increasingly important systems. Recently, the evolution of the software industry opens up several problems that need to be tackled including (1) addressing the software evolution due software upgrades, and (2) solving the cold-start problem, where data from the system of interest is not available. In this paper, we propose a framework for anomaly detection in log data, as a major troubleshooting source of system information. To that end, we utilize pre-trained general-purpose language models to preserve the semantics of log messages and map them into log vector embeddings. The key idea is that these representations for the logs are robust and less invariant to changes in the logs, and therefore, result in a better generalization of the anomaly detection models. We perform several experiments on a cloud dataset evaluating different language models for obtaining numerical log representations such as BERT, GPT-2, and XL. The robustness is evaluated by gradually altering log messages, to simulate a change in semantics. Our results show that the proposed approach achieves high performance and robustness, which opens up possibilities for future research in this direction.

引用

页码：19 / 24

页数：6

共 50 条

[1] BERT-Log: Anomaly Detection for System Logs Based on Pre-trained Language Model
Chen, Song
Liao, Hai
APPLIED ARTIFICIAL INTELLIGENCE, 2022, 36 (01)
[2] Robust Lottery Tickets for Pre-trained Language Models
Zheng, Rui
Bao, Rong
Zhou, Yuhao
Liang, Di
Wane, Sirui
Wu, Wei
Gui, Tao
Zhang, Qi
Huang, Xuanjing
PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 2211 - 2224
[3] Sprelog: Log-Based Anomaly Detection with Self-matching Networks and Pre-trained Models
Yang, Haitian
Zhao, Xuan
Sun, Degang
Wang, Yan
Huang, Weiqing
SERVICE-ORIENTED COMPUTING (ICSOC 2021), 2021, 13121 : 736 - 743
[4] Emotional Paraphrasing Using Pre-trained Language Models
Casas, Jacky
Torche, Samuel
Daher, Karl
Mugellini, Elena
Abou Khaled, Omar
2021 9TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION WORKSHOPS AND DEMOS (ACIIW), 2021,
[5] Pre-Trained Language Models and Their Applications
Wang, Haifeng
Li, Jiwei
Wu, Hua
Hovy, Eduard
Sun, Yu
ENGINEERING, 2023, 25 : 51 - 65
[6] Adapting Pre-trained Language Models to Rumor Detection on Twitter
Slimi, Hamda
Bounhas, Ibrahim
Slimani, Yahya
JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2021, 27 (10) : 1128 - 1148
[7] A Data Cartography based MixUp for Pre-trained Language Models
Park, Seo Yeon
Caragea, Cornelia
NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 4244 - 4250
[8] Pre-trained Language Models with Limited Data for Intent Classification
Kasthuriarachchy, Buddhika
Chetty, Madhu
Karmakar, Gour
Walls, Darren
2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
[9] μBERT: Mutation Testing using Pre-Trained Language Models
Degiovanni, Renzo
Papadakis, Mike
2022 IEEE 15TH INTERNATIONAL CONFERENCE ON SOFTWARE TESTING, VERIFICATION AND VALIDATION WORKSHOPS (ICSTW 2022), 2022, : 160 - 169
[10] Devulgarization of Polish Texts Using Pre-trained Language Models
Klamra, Cezary
Wojdyga, Grzegorz
Zurowski, Sebastian
Rosalska, Paulina
Kozlowska, Matylda
Ogrodniczuk, Maciej
COMPUTATIONAL SCIENCE, ICCS 2022, PT II, 2022, : 49 - 55

← 1 2 3 4 5 →