Enhancing Video Anomaly Detection Using a Transformer Spatiotemporal Attention Unsupervised Framework for Large Datasets

被引:2
|
作者
Habeb, Mohamed H. [1 ]
Salama, May [1 ]
Elrefaei, Lamiaa A. [1 ]
机构
[1] Benha Univ, Fac Engn Shoubra, Elect Engn Dept, Cairo 11629, Egypt
关键词
video anomaly detection; unsupervised learning; spatiotemporal modeling; large datasets; LOCALIZATION; RECOGNITION; HISTOGRAMS; EXTRACTION;
D O I
10.3390/a17070286
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This work introduces an unsupervised framework for video anomaly detection, leveraging a hybrid deep learning model that combines a vision transformer (ViT) with a convolutional spatiotemporal relationship (STR) attention block. The proposed model addresses the challenges of anomaly detection in video surveillance by capturing both local and global relationships within video frames, a task that traditional convolutional neural networks (CNNs) often struggle with due to their localized field of view. We have utilized a pre-trained ViT as an encoder for feature extraction, which is then processed by the STR attention block to enhance the detection of spatiotemporal relationships among objects in videos. The novelty of this work is utilizing the ViT with the STR attention to detect video anomalies effectively in large and heterogeneous datasets, an important thing given the diverse environments and scenarios encountered in real-world surveillance. The framework was evaluated on three benchmark datasets, i.e., the UCSD-Ped2, CHUCK Avenue, and ShanghaiTech. This demonstrates the model's superior performance in detecting anomalies compared to state-of-the-art methods, showcasing its potential to significantly enhance automated video surveillance systems by achieving area under the receiver operating characteristic curve (AUC ROC) values of 95.6, 86.8, and 82.1. To show the effectiveness of the proposed framework in detecting anomalies in extra-large datasets, we trained the model on a subset of the huge contemporary CHAD dataset that contains over 1 million frames, achieving AUC ROC values of 71.8 and 64.2 for CHAD-Cam 1 and CHAD-Cam 2, respectively, which outperforms the state-of-the-art techniques.
引用
收藏
页数:31
相关论文
共 50 条
  • [1] Residual spatiotemporal autoencoder for unsupervised video anomaly detection
    K. Deepak
    S. Chandrakala
    C. Krishna Mohan
    Signal, Image and Video Processing, 2021, 15 : 215 - 222
  • [2] Residual spatiotemporal autoencoder for unsupervised video anomaly detection
    Deepak, K.
    Chandrakala, S.
    Mohan, C. Krishna
    SIGNAL IMAGE AND VIDEO PROCESSING, 2021, 15 (01) : 215 - 222
  • [3] Enhancing Latent Features for Unsupervised Video Anomaly Detection
    Zhou, Linmao
    Chang, Hong
    Kang, Nan
    Zhao, Xiangjun
    Ma, Bingpeng
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2021, PT II, 2021, 13020 : 299 - 310
  • [4] Memory-Token Transformer for Unsupervised Video Anomaly Detection
    Li, Youyu
    Song, Xiaoning
    Xu, Tianyang
    Feng, Zhenhua
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 3325 - 3332
  • [5] TOWARDS AN UNSUPERVISED METHOD FOR NETWORK ANOMALY DETECTION IN LARGE DATASETS
    Bhuyan, Monowar Hussain
    Bhattacharyya, Dhruba K.
    Kalita, Jugal K.
    COMPUTING AND INFORMATICS, 2014, 33 (01) : 1 - 34
  • [6] AONet: Attention network with optional activation for unsupervised video anomaly detection
    Rakhmonov, Akhrorjon Akhmadjon Ugli
    Subramanian, Barathi
    Varnousefaderani, Bahar Amirian
    Kim, Jeonghong
    ETRI JOURNAL, 2024, 46 (05) : 890 - 903
  • [7] TransAnomaly: Video Anomaly Detection Using Video Vision Transformer
    Yuan, Hongchun
    Cai, Zhenyu
    Zhou, Hui
    Wang, Yue
    Chen, Xiangzhi
    IEEE ACCESS, 2021, 9 : 123977 - 123986
  • [8] Multi-scale spatiotemporal normality learning for unsupervised video anomaly detection
    Liu, Caitian
    Gong, Linxiao
    Chen, Xiong
    APPLIED INTELLIGENCE, 2025, 55 (07)
  • [9] Unsupervised video anomaly detection using feature clustering
    Li, H.
    Achim, A.
    Bull, D.
    IET SIGNAL PROCESSING, 2012, 6 (05) : 521 - 533
  • [10] Video Anomaly Detection Using Encoder-Decoder Networks with Video Vision Transformer and Channel Attention Blocks
    Kobayashi, Shimpei
    Hizukuri, Akiyoshi
    Nakayama, Ryohei
    2023 18TH INTERNATIONAL CONFERENCE ON MACHINE VISION AND APPLICATIONS, MVA, 2023,