Multi-Level Two-Stream Fusion-Based Spatio-Temporal Attention Model for Violence Detection and Localization

被引:7
|
作者
Asad, Mujtaba [1 ]
Jiang, He [1 ]
Yang, Jie [1 ]
Tu, Enmei [1 ]
Malik, Aftab A. [2 ]
机构
[1] Shanghai Jiao Tong Univ, Inst Image Proc & Pattern Recognit, Shanghai 200240, Peoples R China
[2] Lahore Garrison Univ, Dept Software Engn, Lahore 54810, Pakistan
关键词
Violence detection; autonomous video surveillance; multi-layer feature fusion; spatio-temporal attention; RECOGNITION; NETWORKS;
D O I
10.1142/S0218001422550023
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Detection of violent human behavior is necessary for public safety and monitoring. However, it demands constant human observation and attention in human-based surveillance systems, which is a challenging task. Autonomous detection of violent human behavior is therefore essential for continuous uninterrupted video surveillance. In this paper, we propose a novel method for violence detection and localization in videos using the fusion of spatio-temporal features and attention model. The model consists of Fusion Convolutional Neural Network (Fusion-CNN), spatio-temporal attention modules and Bi-directional Convolutional LSTMs (BiConvLSTM). The Fusion-CNN learns both spatial and temporal features by combining multi-level inter-layer features from both RGB and Optical flow input frames. The spatial attention module is used to generate an importance mask to focus on the most important areas of the image frame. The temporal attention part, which is based on BiConvLSTM, identifies the most significant video frames which are related to violent activity. The proposed model can also localize and discriminate prominent regions in both spatial and temporal domains, given the weakly supervised training with only video-level classification labels. Experimental results evaluated on different publicly available benchmarking datasets show the superior performance of the proposed model in comparison with the existing methods. Our model achieves the improved accuracies (ACC) of 89.1%, 99.1% and 98.15% for RWF-2000, HockeyFight and Crowd-Violence datasets, respectively. For CCTV-FIGHTS dataset, we choose the mean average precision (mAp) performance metric and our model obtained 80.7% mAp.
引用
收藏
页数:25
相关论文
共 50 条
  • [1] TWO-STREAM ATTENTION SPATIO-TEMPORAL NETWORK FOR CLASSIFICATION OF ECHOCARDIOGRAPHY VIDEOS
    Feng, Zishun
    Sivak, Joseph A.
    Krishnamurthy, Ashok K.
    2021 IEEE 18TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (ISBI), 2021, : 1461 - 1465
  • [2] A Two-Stream Hybrid Spatio-Temporal Fusion Network For sEMG-Based Gesture Recognition
    Ruiqi Han
    Juan Wang
    Jia Wang
    Instrumentation, 2024, 11 (04) : 53 - 63
  • [3] Spatio-temporal Multi-level Fusion for Human Action Recognition
    Manh-Hung Lu
    Thi-Oanh Nguyen
    SOICT 2019: PROCEEDINGS OF THE TENTH INTERNATIONAL SYMPOSIUM ON INFORMATION AND COMMUNICATION TECHNOLOGY, 2019, : 298 - 305
  • [4] MLSTIF: multi-level spatio-temporal and human-object interaction feature fusion network for spatio-temporal action detection
    Rui Yang
    Hui Zhang
    Mulan Qiu
    Min Wang
    Multimedia Systems, 2025, 31 (3)
  • [5] Real-time anomaly detection on surveillance video with two-stream spatio-temporal generative model
    Liu, Weijia
    Cao, Jiuxin
    Zhu, Yilin
    Liu, Bo
    Zhu, Xuelin
    MULTIMEDIA SYSTEMS, 2023, 29 (01) : 59 - 71
  • [6] Real-time anomaly detection on surveillance video with two-stream spatio-temporal generative model
    Weijia Liu
    Jiuxin Cao
    Yilin Zhu
    Bo Liu
    Xuelin Zhu
    Multimedia Systems, 2023, 29 : 59 - 71
  • [7] Spatio-Temporal Memory Augmented Multi-Level Attention Network for Traffic Prediction
    Liu, Yan
    Guo, Bin
    Meng, Jingxiang
    Zhang, Daqing
    Yu, Zhiwen
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (06) : 2643 - 2658
  • [8] STNet: Spatio-Temporal Fusion-Based Self-Attention for Slip Detection in Visuo-Tactile Sensors
    Lu, Jin
    Niu, Bangyan
    Ma, Huan
    Zhu, Jiafeng
    Ji, Jingjing
    2024 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA 2024, 2024, : 3051 - 3056
  • [9] Multi-Level Adversarial Spatio-Temporal Learning for Footstep Pressure Based FoG Detection
    Hu, Kun
    Mei, Shaohui
    Wang, Wei
    Martens, Kaylena A. Ehgoetz
    Wang, Liang
    Lewis, Simon J. G.
    Feng, David D.
    Wang, Zhiyong
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2023, 27 (08) : 4166 - 4177
  • [10] Spatio-Temporal Learning for Video Deblurring based on Two-Stream Generative Adversarial Network
    Liyao Song
    Quan Wang
    Haiwei Li
    Jiancun Fan
    Bingliang Hu
    Neural Processing Letters, 2021, 53 : 2701 - 2714