Resstanet: deep residual spatio-temporal attention network for violent action recognition

被引:0
|
作者
Ajeet Pandey
Piyush Kumar
机构
[1] National Institute of Technology Patna,Computer Science and Engineering
关键词
Convolutional neural network; Violent action recognition; Multi-head attention; Residual connection;
D O I
10.1007/s41870-024-01799-w
中图分类号
学科分类号
摘要
Violent Action Recognition (VAR) is a critical domain of research within computer vision and artificial intelligence, aiming to automatically detect violent behaviors in videos. Most of the current VAR methods are not able to extract spatial and temporal features simultaneously, which is crucial for human action recognition. This paper introduces a unique Residual Spatio-Temporal Attention Network (ResSTANet) model for robust violent action recognition. The ResSTANet uses a residual 3-Dimensional Convolutional Neural Network (3D-CNN) for effectively capturing spatiotemporal dynamics simultaneously. A residual connection is incorporated to improve information flow and control critical spatial-temporal features. The output of the residual 3D-CNN is subjected to a Multi-Head Attention (Mu-HA) process, increasing the focus on crucial features. Subsequently, multiple dense and dropout layers are applied to refine feature selection and reduce noise in the representation. Finally, a softmax layer is applied to perform action recognition, achieving state-of-the-art performance in VAR tasks.
引用
收藏
页码:2891 / 2900
页数:9
相关论文
共 50 条
  • [1] A Spatio-Temporal Motion Network for Action Recognition Based on Spatial Attention
    Yang, Qi
    Lu, Tongwei
    Zhou, Huabing
    [J]. ENTROPY, 2022, 24 (03)
  • [2] SPATIO-TEMPORAL SLOWFAST SELF-ATTENTION NETWORK FOR ACTION RECOGNITION
    Kim, Myeongjun
    Kim, Taehun
    Kim, Daijin
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 2206 - 2210
  • [3] Facial Expression Recognition Based on Deep Spatio-Temporal Attention Network
    Li, Shuqin
    Zheng, Xiangwei
    Zhang, Xia
    Chen, Xuanchi
    Li, Wei
    [J]. COLLABORATIVE COMPUTING: NETWORKING, APPLICATIONS AND WORKSHARING, COLLABORATECOM 2022, PT II, 2022, 461 : 516 - 532
  • [4] Efficient spatio-temporal network for action recognition
    Su, Yanxiong
    Zhao, Qian
    [J]. JOURNAL OF REAL-TIME IMAGE PROCESSING, 2024, 21 (05)
  • [5] Interpretable Spatio-temporal Attention for Video Action Recognition
    Meng, Lili
    Zhao, Bo
    Chang, Bo
    Huang, Gao
    Sun, Wei
    Tung, Frederich
    Sigal, Leonid
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 1513 - 1522
  • [6] Spatio-Temporal Attention Networks for Action Recognition and Detection
    Li, Jun
    Liu, Xianglong
    Zhang, Wenxuan
    Zhang, Mingyuan
    Song, Jingkuan
    Sebe, Nicu
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (11) : 2990 - 3001
  • [7] Spatio-Temporal Deep Residual Network with Hierarchical Attentions for Video Event Recognition
    Li, Yonggang
    Liu, Chunping
    Ji, Yi
    Gong, Shengrong
    Xu, Haibao
    [J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2020, 16 (02)
  • [8] Deep residual infrared action recognition by integrating local and global spatio-temporal cues
    Imran, Javed
    Raman, Balasubramanian
    [J]. INFRARED PHYSICS & TECHNOLOGY, 2019, 102
  • [9] Spatio-Temporal Self-Attention Weighted VLAD Neural Network for Action Recognition
    Cheng, Shilei
    Xie, Mei
    Ma, Zheng
    Li, Siqi
    Gu, Song
    Yang, Feng
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2021, E104D (01) : 220 - 224
  • [10] Unified Spatio-Temporal Attention Networks for Action Recognition in Videos
    Li, Dong
    Yao, Ting
    Duan, Ling-Yu
    Mei, Tao
    Rui, Yong
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (02) : 416 - 428