Multi-Level Two-Stream Fusion-Based Spatio-Temporal Attention Model for Violence Detection and Localization

被引:7
|
作者
Asad, Mujtaba [1 ]
Jiang, He [1 ]
Yang, Jie [1 ]
Tu, Enmei [1 ]
Malik, Aftab A. [2 ]
机构
[1] Shanghai Jiao Tong Univ, Inst Image Proc & Pattern Recognit, Shanghai 200240, Peoples R China
[2] Lahore Garrison Univ, Dept Software Engn, Lahore 54810, Pakistan
关键词
Violence detection; autonomous video surveillance; multi-layer feature fusion; spatio-temporal attention; RECOGNITION; NETWORKS;
D O I
10.1142/S0218001422550023
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Detection of violent human behavior is necessary for public safety and monitoring. However, it demands constant human observation and attention in human-based surveillance systems, which is a challenging task. Autonomous detection of violent human behavior is therefore essential for continuous uninterrupted video surveillance. In this paper, we propose a novel method for violence detection and localization in videos using the fusion of spatio-temporal features and attention model. The model consists of Fusion Convolutional Neural Network (Fusion-CNN), spatio-temporal attention modules and Bi-directional Convolutional LSTMs (BiConvLSTM). The Fusion-CNN learns both spatial and temporal features by combining multi-level inter-layer features from both RGB and Optical flow input frames. The spatial attention module is used to generate an importance mask to focus on the most important areas of the image frame. The temporal attention part, which is based on BiConvLSTM, identifies the most significant video frames which are related to violent activity. The proposed model can also localize and discriminate prominent regions in both spatial and temporal domains, given the weakly supervised training with only video-level classification labels. Experimental results evaluated on different publicly available benchmarking datasets show the superior performance of the proposed model in comparison with the existing methods. Our model achieves the improved accuracies (ACC) of 89.1%, 99.1% and 98.15% for RWF-2000, HockeyFight and Crowd-Violence datasets, respectively. For CCTV-FIGHTS dataset, we choose the mean average precision (mAp) performance metric and our model obtained 80.7% mAp.
引用
收藏
页数:25
相关论文
共 50 条
  • [21] Fire Recognition Using Spatio-Temporal Two-Stream Convolutional Neural Network with Fully Connected Layer-Fusion
    Shin, Joongchol
    Park, Hasil
    Paik, Joonki
    2018 IEEE 8TH INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS - BERLIN (ICCE-BERLIN), 2018,
  • [22] Spatio-temporal co-attention fusion network for video splicing localization
    Lin, Man
    Cao, Gang
    Lou, Zijie
    Zhang, Chi
    JOURNAL OF ELECTRONIC IMAGING, 2024, 33 (03) : 33027
  • [23] Stream-Flow Forecasting Based on Dynamic Spatio-Temporal Attention
    Feng, Jun
    Yan, Le
    Hang, Tingting
    IEEE ACCESS, 2019, 7 : 134754 - 134762
  • [24] Presentation attack detection based on two-stream vision transformers with self-attention fusion
    Peng, Fei
    Meng, Shao-hua
    Long, Min
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2022, 85
  • [25] Spatio-temporal feature fusion model based on Attention mechanism for RFID indoor positioning
    Chen, Houjin
    Yang, Lvqing
    Yang, Mulan
    Hou, Xuehan
    Chen, Sien
    Dong, Wensheng
    Yu, Bo
    Wang, Qingkai
    PROCEEDINGS OF THE 2024 27 TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN, CSCWD 2024, 2024, : 1473 - 1478
  • [26] Migrant Capitals: Proposing a Multi-Level Spatio-Temporal Analytical Framework
    Erel, Umut
    Ryan, Louise
    SOCIOLOGY-THE JOURNAL OF THE BRITISH SOCIOLOGICAL ASSOCIATION, 2019, 53 (02): : 246 - 263
  • [27] Adaptive and Interactive Multi-Level Spatio-Temporal Network for Traffic Forecasting
    Zhang, Yudong
    Wang, Pengkun
    Wang, Binwu
    Wang, Xu
    Zhao, Zhe
    Zhou, Zhengyang
    Bai, Lei
    Wang, Yang
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, 25 (10) : 14070 - 14086
  • [28] Dual Stream Spatio-Temporal Motion Fusion With Self-Attention For Action Recognition
    Jalal, Md Asif
    Aftab, Waqas
    Moore, Roger K.
    Mihaylova, Lyudmila
    2019 22ND INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION 2019), 2019,
  • [29] Two-Stream Edge-Aware Network for Infrared and Visible Image Fusion With Multi-Level Wavelet Decomposition
    Wang, Haozhe
    Shu, Chang
    Li, Xiaofeng
    Fu, Yu
    Fu, Zhizhong
    Yin, Xiaofeng
    IEEE ACCESS, 2024, 12 : 22190 - 22204
  • [30] Multi-attention network for pedestrian intention prediction based on spatio-temporal feature fusion
    Zhang, Xiaofei
    Wang, Xiaolan
    Zhang, Weiwei
    Wang, Yansong
    Liu, Xintian
    Wei, Dan
    PROCEEDINGS OF THE INSTITUTION OF MECHANICAL ENGINEERS PART D-JOURNAL OF AUTOMOBILE ENGINEERING, 2024, 238 (13) : 4202 - 4215