Cascading spatio-temporal attention network for real-time action detection

被引:0
|
作者
Jianhua Yang
Ke Wang
Ruifeng Li
Petra Perner
机构
[1] Harbin Institute of Technology,State Key Laboratory of Robotics and System
[2] Harbin Institute of Technology,Zhengzhou Research Institute
[3] FutureLab Artificial Intelligence IBaI-2,undefined
来源
关键词
Spatio-temporal action detection; Human behavior analysis; Spatio-temporal attention;
D O I
暂无
中图分类号
学科分类号
摘要
Accurately detecting human actions in video has many applications, such as video surveillance and somatosensory games. In this paper, we propose a spatial-aware attention module (SAM) and a temporal-aware attention module (TAM) for spatio-temporal action detection in videos. SAM first concatenates the feature maps of consecutive frames on the channel and then uses dilated convolutional layer followed by a sigmoid function to generate a spatial attention map. The resulting attention map contains spatial information from consecutive frames, so it helps the detector focus on salient spatial features to achieve more accurate localization of action instances in consecutive frames. TAM deploys several fully connected layers to generate a temporal attention map. The temporal attention map focuses on the temporal association of each spatial feature; it can capture the temporal association of action instances, thereby improving the detector to track actions. To evaluate the effectiveness of SAM and TAM, we build an efficient and strong anchor-free action detector, cascading spatio-temporal attention network, equipped with a 2D backbone and SAM and TAM modules. Extensive experiments on two benchmarks, JHMDB and UCF101-24, demonstrate the preferable performance of SAM and TAM.
引用
收藏
相关论文
共 50 条
  • [41] Real-time abandoned and stolen object detection based on spatio-temporal features in crowded scenes
    Nam, Yunyoung
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2016, 75 (12) : 7003 - 7028
  • [42] Real-Time Human Fault Detection in Assembly Tasks, Based on Human Action Prediction Using a Spatio-Temporal Learning Model
    Zhang, Zhujun
    Peng, Gaoliang
    Wang, Weitian
    Chen, Yi
    [J]. SUSTAINABILITY, 2022, 14 (15)
  • [43] Spatio-Temporal Action Detector with Self-Attention
    Ma, Xurui
    Luo, Zhigang
    Zhang, Xiang
    Liao, Qing
    Shen, Xingyu
    Wang, Mengzhu
    [J]. 2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [44] Interpretable Spatio-temporal Attention for Video Action Recognition
    Meng, Lili
    Zhao, Bo
    Chang, Bo
    Huang, Gao
    Sun, Wei
    Tung, Frederich
    Sigal, Leonid
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 1513 - 1522
  • [45] Cascading Spatio-Temporal Pattern Discovery
    Mohan, Pradeep
    Shekhar, Shashi
    Shine, James A.
    Rogers, James P.
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2012, 24 (11) : 1977 - 1992
  • [46] Efficient spatio-temporal network for action recognition
    Su, Yanxiong
    Zhao, Qian
    [J]. JOURNAL OF REAL-TIME IMAGE PROCESSING, 2024, 21 (05)
  • [47] Towards Edge-Aware Spatio-Temporal Filtering in Real-Time
    Schaffner, Michael
    Scheidegger, Florian
    Cavigelli, Lukas
    Kaeslin, Hubert
    Benini, Luca
    Smolic, Aljosa
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (01) : 265 - 280
  • [48] Dynamic Real-Time Spatio-Temporal Acquisition and Rendering in Adverse Environments
    Dutta, Somnath
    Ganovelli, Fabio
    Cignoni, Paolo
    [J]. GEOGRAPHICAL INFORMATION SYSTEMS THEORY, APPLICATIONS AND MANAGEMENT, GISTAM 2023, 2024, 2107 : 34 - 53
  • [49] Nonlinear Spatio-temporal Wave Computing for Real-time Applications on GPU
    Tukel, Mehmet
    Yeniceri, Ramazan
    Yalcin, Mustak E.
    [J]. 2012 13TH INTERNATIONAL WORKSHOP ON CELLULAR NANOSCALE NETWORKS AND THEIR APPLICATIONS (CNNA), 2012,
  • [50] Real-Time Video Sequences Matching Using the Spatio-Temporal Fingerprint
    Pribula, Ondrej
    Pohanka, Jan
    Fischer, Jan
    [J]. MELECON 2010: THE 15TH IEEE MEDITERRANEAN ELECTROTECHNICAL CONFERENCE, 2010, : 911 - 916