Efficient anomaly detection in tabular cybersecurity data using large language models

被引:0
|
作者
Xiaoyong Zhao [1 ]
Xingxin Leng [1 ]
Lei Wang [1 ]
Ningning Wang [1 ]
Yanqiong Liu [1 ]
机构
[1] Beijing Information Science and Technology University,
关键词
Anomaly detection; Large language models; Network security; Prompt engineering; Tabular data;
D O I
10.1038/s41598-025-88050-z
中图分类号
学科分类号
摘要
In cybersecurity, anomaly detection in tabular data is essential for ensuring information security. While traditional machine learning and deep learning methods have shown some success, they continue to face significant challenges in terms of generalization. To address these limitations, this paper presents an innovative method for tabular data anomaly detection based on large language models, called “Tabular Anomaly Detection via Guided Prompts” (TAD-GP). This approach utilizes a 7-billion-parameter open-source model and incorporates strategies such as data sample introduction, anomaly type recognition, chain-of-thought reasoning, multi-turn dialogue, and key information reinforcement. Experimental results indicate that the TAD-GP framework improves F1 scores by 79.31%, 97.96%, and 59.09% on the CICIDS2017, KDD Cup 1999, and UNSW-NB15 datasets, respectively. Furthermore, the smaller-scale TAD-GP model outperforms larger models across multiple datasets, demonstrating its practical potential in environments with constrained computational resources and requirements for private deployment. This method addresses a critical gap in research on anomaly detection in cybersecurity, specifically using small-scale open-source models.
引用
下载
收藏
相关论文
共 50 条
  • [1] Data-Efficient and Interpretable Tabular Anomaly Detection
    Chang, Chun-Hao
    Yoon, Jinsung
    Arik, Sercan O.
    Udell, Madeleine
    Pfister, Tomas
    PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023, 2023, : 190 - 201
  • [2] Large Language Models for Tabular Data: Progresses and Future Directions
    Dong, Haoyu
    Wang, Zhiruo
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 2997 - 3000
  • [3] Semantic anomaly detection with large language models
    Amine Elhafsi
    Rohan Sinha
    Christopher Agia
    Edward Schmerling
    Issa A. D. Nesnas
    Marco Pavone
    Autonomous Robots, 2023, 47 : 1035 - 1055
  • [4] Semantic anomaly detection with large language models
    Elhafsi, Amine
    Sinha, Rohan
    Agia, Christopher
    Schmerling, Edward
    Nesnas, Issa A. D.
    Pavone, Marco
    AUTONOMOUS ROBOTS, 2023, 47 (08) : 1035 - 1055
  • [5] Interactive Anomaly Detection in Mixed Tabular Data using Bayesian Networks
    Dufraisse, Evan
    Leray, Philippe
    Nedellec, Raphael
    Benkhelif, Tarek
    INTERNATIONAL CONFERENCE ON PROBABILISTIC GRAPHICAL MODELS, VOL 138, 2020, 138 : 185 - 196
  • [6] Efficient Detection of Toxic Prompts in Large Language Models
    Liu, Yi
    Yu, Junzhe
    Sun, Huijia
    Shi, Ling
    Deng, Gelei
    Chen, Yuqi
    Liu, Yang
    arXiv, 1600,
  • [7] Leveraging Large Language Models and BERT for Log Parsing and Anomaly Detection
    Zhou, Yihan
    Chen, Yan
    Rao, Xuanming
    Zhou, Yukang
    Li, Yuxin
    Hu, Chao
    MATHEMATICS, 2024, 12 (17)
  • [8] Corruption-based anomaly detection and interpretation in tabular data
    Mok, Chunghyup
    Kim, Seoung Bum
    Pattern Recognition, 2025, 159
  • [9] Robust and Transferable Anomaly Detection in Log Data using Pre-Trained Language Models
    Ott, Harold
    Bogatinovski, Jasmin
    Acker, Alexander
    Nedelkoski, Sasho
    Kao, Odej
    2021 IEEE/ACM INTERNATIONAL WORKSHOP ON CLOUD INTELLIGENCE (CLOUDINTELLIGENCE 2021), 2021, : 19 - 24
  • [10] The implementation solution for automatic visualization of tabular data in relational databases based on large language models
    Yang, Hao
    Yang, Zhaoyong
    Zhao, Ruyang
    Li, Xiaoran
    Rao, Gaoqi
    2024 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, IALP 2024, 2024, : 175 - 180