Improved SSD using deep multi-scale attention spatial–temporal features for action recognition

被引:0
|
作者
Shuren Zhou
Jia Qiu
Arun Solanki
机构
[1] Changsha University of Science and Technology,School of Computer and Communication Engineering
[2] Gautam Buddha University,School of Information and Communication Technology
来源
Multimedia Systems | 2022年 / 28卷
关键词
Action recognition; Multi-scale spatial–temporal feature; Attention mechanism;
D O I
暂无
中图分类号
学科分类号
摘要
The biggest difference between video-based action recognition and image-based action recognition is that the former has an extra feature of time dimension. Most methods of action recognition based on deep learning adopt: (1) using 3D convolution to modeling the temporal features; (2) introducing an auxiliary temporal feature, such as optical flow. However, the 3D convolution network usually consumes huge computational resources. The extraction of optical flow requires an extra tedious process with an extra space for storage, and is usually modeled for short-range temporal features. To construct the temporal features better, in this paper we propose a multi-scale attention spatial–temporal features network based on SSD, by means of piecewise on long range of the whole video sequence to sparse sampling of video, using the self-attention mechanism to capture the relation between one frame and the sequence of frames sampled on the entire range of video, making the network notice the representative frames on the sequence. Moreover, the attention mechanism is used to assign different weights to the inter-frame relations representing different time scales, so as to reasoning the contextual relations of actions in the time dimension. Our proposed method achieves competitive performance on two commonly used datasets: UCF101 and HMDB51.
引用
收藏
页码:2123 / 2131
页数:8
相关论文
共 50 条
  • [31] Novel multi-scale deep residual attention network for facial expression recognition
    Liu, Dong
    Wang, Lifeng
    Wang, Zhiyong
    Chen, Longxi
    JOURNAL OF ENGINEERING-JOE, 2020, 2020 (12): : 1220 - 1226
  • [32] Compact Multi-scale Periocular Recognition Using SAFE Features
    Alonso-Fernandez, Fernando
    Mikaelyan, Anna
    Bigun, Josef
    2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 1455 - 1460
  • [33] Learning multi-temporal-scale deep information for action recognition
    Guangle Yao
    Tao Lei
    Jiandan Zhong
    Ping Jiang
    Applied Intelligence, 2019, 49 : 2017 - 2029
  • [34] Learning multi-temporal-scale deep information for action recognition
    Yao, Guangle
    Lei, Tao
    Zhong, Jiandan
    Jiang, Ping
    APPLIED INTELLIGENCE, 2019, 49 (06) : 2017 - 2029
  • [35] Improved RGBD Semantic Segmentation Using Multi-Scale Features
    Gao, Xiaoning
    Cai, Meng
    Li, Jianxun
    PROCEEDINGS OF THE 30TH CHINESE CONTROL AND DECISION CONFERENCE (2018 CCDC), 2018, : 3531 - 3536
  • [36] MTT: Multi-Scale Temporal Transformer for Skeleton-Based Action Recognition
    Kong, Jun
    Bian, Yuhang
    Jiang, Min
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 528 - 532
  • [37] MULTI-SCALE TEMPORAL FEATURE FUSION FOR FEW-SHOT ACTION RECOGNITION
    Lee, Jun-Tae
    Yun, Sungrack
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 1785 - 1789
  • [38] Foreground Detection with Deeply Learned Multi-Scale Spatial-Temporal Features
    Wang, Yao
    Yu, Zujun
    Zhu, Liqiang
    SENSORS, 2018, 18 (12)
  • [39] Learning Deep Structured Multi-Scale Features using Attention-Gated CRFs for Contour Prediction
    Xu, Dan
    Ouyang, Wanli
    Alameda-Pineda, Xavier
    Ricci, Elisa
    Wang, Xiaogang
    Sebe, Nicu
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [40] EFFICIENT SPEECH EMOTION RECOGNITION USING MULTI-SCALE CNN AND ATTENTION
    Peng, Zixuan
    Lu, Yu
    Pan, Shengfeng
    Liu, Yunfeng
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 3020 - 3024