Detecting log anomaly using subword attention encoder and probabilistic feature selection

被引：0

作者：

M. Hariharan

Abhinesh Mishra

Sriram Ravi

Ankita Sharma

Anshul Tanwar

Krishna Sundaresan

Prasanna Ganesan

R. Karthik

机构：

[1] Cisco Systems India Pvt Ltd,Center for Cyber Physical Systems

[2] Vellore Institute of Technology,undefined

来源：

Applied Intelligence | 2023年 / 53卷

关键词：

Deep learning; Self-attention; Naive Bayes; Syslog; Anomaly detection; Encoder-decoder;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Log anomaly is a manifestation of a software system error or security threat. Detecting such unusual behaviours across logs in real-time is the driving force behind large-scale autonomous monitoring technology that can rapidly alert zero-day attacks. Increasingly, AI methods are being used to process voluminous log datasets and reveal patterns of correlated anomaly. In this paper, we propose an enhanced approach to learning semantic-aware embeddings for logs called the Subword Encoder Neural network (SEN). Solving upon a key limitation of previous semantic log parsing works, the proposed work introduces the concept of learning word vectors from subword-level granularity using an attention encoder strategy. The learnt embeddings reflect the contextual/lexical relationships at the word level. As a result, the learnt word representations precisely capture new log messages previously not seen by the model. Furthermore, we develop a novel feature distillation algorithm termed Naive Bayes Feature Selector (NBFS) to extract useful log events. This probabilistic technique examines the occurrence pattern of events to only select the salient ones that can aid anomaly detection. To our best knowledge, this is the first attempt to associate affinity to log events based on the target task. Since the predictions can be traced to the log messages, the AI is inherently explainable too. The model outperforms state-of-the-art methods by a fair margin. It achieves a 0.99 detection F1-score on the benchmarked BGL, HDFS and OpenStack log datasets.

引用

页码：22297 / 22312

页数：15

共 50 条

[11] Unsupervised Anomaly Detection Using Variational Auto-Encoder based Feature Extraction
Yao, Rong
Liu, Chongdang
Zhang, Linxuan
Peng, Peng
2019 IEEE INTERNATIONAL CONFERENCE ON PROGNOSTICS AND HEALTH MANAGEMENT (ICPHM), 2019,
[12] Feature Selection for Anomaly Detection Using Optical Emission Spectroscopy
Puggini, Luca
McLoone, Sean
IFAC PAPERSONLINE, 2016, 49 (05): : 132 - 137
[13] Improving feature selection in anomaly intrusion detection using specifications
Wang, Y
Miner, A
Wong, J
Uppuluri, P
DISTRIBUTED COMPUTING AND INTERNET TECHNOLOGY, PROCEEDINGS, 2004, 3347 : 468 - 468
[14] Unsupervised probabilistic feature selection using ant colony optimization
Dadaneh, Behrouz Zamani
Markid, Hossein Yeganeh
Zakerolhosseini, Ali
EXPERT SYSTEMS WITH APPLICATIONS, 2016, 53 : 27 - 42
[15] Feature Selection Using Probabilistic Prediction of Support Vector Regression
Yang, Jian-Bo
Ong, Chong-Jin
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2011, 22 (06): : 954 - 962
[16] Multiple feature set with feature selection for anomaly search in videos using hybrid classification
A. Srinivasan
V. K. Gnanavel
Multimedia Tools and Applications, 2019, 78 : 7713 - 7725
[17] Optimal interval and feature selection in activity data for detecting attention deficit hyperactivity disorder
Shafna, V.
S.D., Madhu Kumar
Computers in Biology and Medicine, 2024, 179
[18] Multiple feature set with feature selection for anomaly search in videos using hybrid classification
Srinivasan, A.
Gnanavel, V. K.
MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (06) : 7713 - 7725
[19] Nonlinear feature selection using sparsity-promoted centroid-encoder
Ghosh, Tomojit
Kirby, Michael
NEURAL COMPUTING & APPLICATIONS, 2023, 35 (29): : 21883 - 21902
[20] Nonlinear feature selection using sparsity-promoted centroid-encoder
Tomojit Ghosh
Michael Kirby
Neural Computing and Applications, 2023, 35 : 21883 - 21902

← 1 2 3 4 5 →