CViT: A Convolution Vision Transformer for Video Abnormal Behavior Detection and Localization

被引:0
|
作者
Roka S. [1 ]
Diwakar M. [1 ,2 ]
机构
[1] CSE Department, Graphic Era deemed to be University, Dehradun
[2] Graphic Era Hill University, Dehradun
关键词
Abnormal behavior; Abnormality; Anomaly detection; AUC; EER; Normal; Transformer; YOLO;
D O I
10.1007/s42979-023-02294-y
中图分类号
学科分类号
摘要
Video anomaly detection is a critical task because of the rare, irregular, and unbounded nature of abnormal events. Currently, most approaches only rely on CNN for such tasks, but due to spatial inductive bias, it can extract only local features from images which is insufficient for video anomaly detection. Recently, transformer-based approaches are getting popular due to their global self-attention mechanism and are considered alternatives to CNN convolution for sequence-to-sequence anomaly detection. Unfortunately, because of a lack of inadequate low-level information, it has limited localization abilities. In this paper, we have proposed a new approach using the CViT block. We design our approach by fusing U-Net and transformer and modified encoder by stacking the CViT block one after the other. This type of combination permits our model to extract richer local and global features from RGB frames. Our approach contains two modules: anomaly detection module is used to detect abnormal frames using PSNR and anomaly score. Whereas the anomaly localization module accepts only a list of abnormal frames and contains the object detection algorithm YOLO to highlight abnormal objects. Our approach was first evaluated by our own custom dataset GEU and for comparison, we use standard benchmark datasets UCSD, CUHK Avenue, and ShanghaiTech. Comparative results depict better performance of our approach in detecting abnormal events. © 2023, The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd.
引用
下载
收藏
相关论文
共 50 条
  • [1] Improving Video Vision Transformer for Deepfake Video Detection Using Facial Landmark, Depthwise Separable Convolution and Self Attention
    Ramadhani, Kurniawan Nur
    Munir, Rinaldi
    Utama, Nugraha Priya
    IEEE ACCESS, 2024, 12 : 8932 - 8939
  • [2] TransAnomaly: Video Anomaly Detection Using Video Vision Transformer
    Yuan, Hongchun
    Cai, Zhenyu
    Zhou, Hui
    Wang, Yue
    Chen, Xiangzhi
    IEEE ACCESS, 2021, 9 : 123977 - 123986
  • [3] DeepFake detection with multi-scale convolution and vision transformer
    Lin, Hao
    Huang, Wenmin
    Luo, Weiqi
    Lu, Wei
    DIGITAL SIGNAL PROCESSING, 2023, 134
  • [4] Detection and Localization of Abnormal Activities in Video Surveillance System
    Momin, B. F.
    Fanase, V. M.
    2015 INTERNATIONAL CONFERENCE ON COMMUNICATIONS AND SIGNAL PROCESSING (ICCSP), 2015, : 277 - 280
  • [5] Improved Deepfake Video Detection Using Convolutional Vision Transformer
    Deressa, Deressa Wodajo
    Lambert, Peter
    Van Wallendael, Glenn
    Atnafu, Solomon
    Mareen, Hannes
    2024 IEEE GAMING, ENTERTAINMENT, AND MEDIA CONFERENCE, GEM 2024, 2024, : 492 - 497
  • [6] ABNORMAL CROWD BEHAVIOR DETECTION IN VIDEO SYSTEMS
    Tokta, Aybars
    Hocaoglu, A. Koksal
    2016 24TH SIGNAL PROCESSING AND COMMUNICATION APPLICATION CONFERENCE (SIU), 2016, : 697 - 700
  • [7] Learning Video Localization on Segment-Level Video Copy Detection with Transformer
    Zhang, Chi
    Liu, Jie
    Zhang, Shuwu
    Zeng, Zhi
    Huang, Ying
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT VII, 2023, 14260 : 439 - 450
  • [8] ViViT: A Video Vision Transformer
    Arnab, Anurag
    Dehghani, Mostafa
    Heigold, Georg
    Sun, Chen
    Lucic, Mario
    Schmid, Cordelia
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 6816 - 6826
  • [9] Multi-Contextual Predictions with Vision Transformer for Video Anomaly Detection
    Lee, Jooyeon
    Nam, Woo-Jeoung
    Lee, Seong-Whan
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 1012 - 1018
  • [10] Deep Appearance Features for Abnormal Behavior Detection in Video
    Smeureanu, Sorina
    Ionescu, Radu Tudor
    Popescu, Marius
    Alexe, Bogdan
    IMAGE ANALYSIS AND PROCESSING (ICIAP 2017), PT II, 2017, 10485 : 779 - 789