Efficient anomaly detection in tabular cybersecurity data using large language models

被引：0

作者：

Xiaoyong Zhao ^{[1
]}

Xingxin Leng ^{[1
]}

Lei Wang ^{[1
]}

Ningning Wang ^{[1
]}

Yanqiong Liu ^{[1
]}

机构：

[1] Beijing Information Science and Technology University,

来源：

Scientific Reports | / 15卷 / 1期

关键词：

Anomaly detection; Large language models; Network security; Prompt engineering; Tabular data;

D O I：

10.1038/s41598-025-88050-z

中图分类号：

学科分类号：

摘要：

In cybersecurity, anomaly detection in tabular data is essential for ensuring information security. While traditional machine learning and deep learning methods have shown some success, they continue to face significant challenges in terms of generalization. To address these limitations, this paper presents an innovative method for tabular data anomaly detection based on large language models, called “Tabular Anomaly Detection via Guided Prompts” (TAD-GP). This approach utilizes a 7-billion-parameter open-source model and incorporates strategies such as data sample introduction, anomaly type recognition, chain-of-thought reasoning, multi-turn dialogue, and key information reinforcement. Experimental results indicate that the TAD-GP framework improves F1 scores by 79.31%, 97.96%, and 59.09% on the CICIDS2017, KDD Cup 1999, and UNSW-NB15 datasets, respectively. Furthermore, the smaller-scale TAD-GP model outperforms larger models across multiple datasets, demonstrating its practical potential in environments with constrained computational resources and requirements for private deployment. This method addresses a critical gap in research on anomaly detection in cybersecurity, specifically using small-scale open-source models.

引用

共 50 条

[21] Prototype-oriented hypergraph representation learning for anomaly detection in tabular data
Li, Shu
Lu, Yi
Jiu, Shicheng
Huang, Haoxiang
Yang, Guangqi
Yu, Jiong
INFORMATION PROCESSING & MANAGEMENT, 2025, 62 (01)
[22] OAB - An Open Anomaly Benchmark Framework for Unsupervised and Semisupervised Anomaly Detection on Image and Tabular Data Sets
Lohrer, Andreas
Deller, Jan
Huenemoerder, Maximilian
Kroeger, Peer
21ST IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS ICDMW 2021, 2021, : 991 - 1000
[23] Bias and Cyberbullying Detection and Data Generation Using Transformer Artificial Intelligence Models and Top Large Language Models
Kumar, Yulia
Huang, Kuan
Perez, Angelo
Yang, Guohao
Li, J. Jenny
Morreale, Patricia
Kruger, Dov
Jiang, Raymond
ELECTRONICS, 2024, 13 (17)
[24] Deception and Lie Detection Using Reduced Linguistic Features, Deep Models and Large Language Models for Transcribed Data
1600, Institute of Electrical and Electronics Engineers Inc.
[25] TabSAL: Synthesizing Tabular data with Small agent Assisted Language models
Li, Jiale
Qian, Run
Tan, Yandan
Li, Zhixin
Chen, Luyu
Liu, Sen
Wu, Jie
Chai, Hongfeng
KNOWLEDGE-BASED SYSTEMS, 2024, 304
[26] Socially Aware Synthetic Data Generation for Suicidal Ideation Detection Using Large Language Models
Ghanadian, Hamideh
Nejadgholi, Isar
Al Osman, Hussein
IEEE ACCESS, 2024, 12 : 14350 - 14363
[27] LogFiT: Log Anomaly Detection Using Fine-Tuned Language Models
Almodovar, Crispin
Sabrina, Fariza
Karimi, Sarvnaz
Azad, Salahuddin
IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, 2024, 21 (02): : 1715 - 1723
[28] Special Issue "AI for Cybersecurity: Robust Models for Authentication, Threat and Anomaly Detection"
Bergadano, Francesco
Giacinto, Giorgio
ALGORITHMS, 2023, 16 (07)
[29] Code Detection for Hardware Acceleration Using Large Language Models
Martinez, Pablo Antonio
Bernabe, Gregorio
Garcia, Jose Manuel
IEEE ACCESS, 2024, 12 : 35271 - 35281
[30] Experience with anomaly detection using ensemble models on streaming data at HIPA
de Portugal, Jaime Coello
Snuverink, Jochem
NUCLEAR INSTRUMENTS & METHODS IN PHYSICS RESEARCH SECTION A-ACCELERATORS SPECTROMETERS DETECTORS AND ASSOCIATED EQUIPMENT, 2021, 1020

← 1 2 3 4 5 →