A Spatio-Temporal Motion Network for Action Recognition Based on Spatial Attention

被引:8
|
作者
Yang, Qi [1 ,2 ]
Lu, Tongwei [1 ,2 ]
Zhou, Huabing [1 ,2 ]
机构
[1] Wuhan Inst Technol, Sch Comp Sci & Engn, Wuhan 430205, Peoples R China
[2] Wuhan Inst Technol, Hubei Key Lab Intelligent Robot, Wuhan 430205, Peoples R China
基金
中国国家自然科学基金;
关键词
temporal modeling; spatio-temporal motion; group convolution; spatial attention;
D O I
10.3390/e24030368
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
Temporal modeling is the key for action recognition in videos, but traditional 2D CNNs do not capture temporal relationships well. 3D CNNs can achieve good performance, but are computationally intensive and not well practiced on existing devices. Based on these problems, we design a generic and effective module called spatio-temporal motion network (SMNet). SMNet maintains the complexity of 2D and reduces the computational effort of the algorithm while achieving performance comparable to 3D CNNs. SMNet contains a spatio-temporal excitation module (SE) and a motion excitation module (ME). The SE module uses group convolution to fuse temporal information to reduce the number of parameters in the network, and uses spatial attention to extract spatial information. The ME module uses the difference between adjacent frames to extract feature-level motion patterns between adjacent frames, which can effectively encode motion features and help identify actions efficiently. We use ResNet-50 as the backbone network and insert SMNet into the residual blocks to form a simple and effective action network. The experiment results on three datasets, namely Something-Something V1, Something-Something V2, and Kinetics-400, show that it out performs state-of-the-arts motion recognition networks.
引用
收藏
页数:19
相关论文
共 50 条
  • [31] Online action proposal generation using spatio-temporal attention network
    Keisham, Kanchan
    Jalali, Amin
    Lee, Minho
    [J]. NEURAL NETWORKS, 2022, 153 : 518 - 529
  • [32] Three-stream spatio-temporal attention network for first-person action and interaction recognition
    Javed Imran
    Balasubramanian Raman
    [J]. Journal of Ambient Intelligence and Humanized Computing, 2022, 13 : 1137 - 1152
  • [33] Three-stream spatio-temporal attention network for first-person action and interaction recognition
    Imran, Javed
    Raman, Balasubramanian
    [J]. JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2022, 13 (02) : 1137 - 1152
  • [34] STHARNet: spatio-temporal human action recognition network in content based video retrieval
    Sowmyayani, S.
    Rani, P. Arockia Jansi
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 82 (24) : 38051 - 38066
  • [35] Skeleton Action Recognition Based on Spatio-temporal Feature Enhanced Graph Convolutional Network
    Cao, Yi
    Wu, Weiguan
    Li, Ping
    Xia, Yu
    Gao, Qingyuan
    [J]. JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2023, 45 (08) : 3022 - 3031
  • [36] STHARNet: spatio-temporal human action recognition network in content based video retrieval
    S. Sowmyayani
    P. Arockia Jansi Rani
    [J]. Multimedia Tools and Applications, 2023, 82 : 38051 - 38066
  • [37] Spatio-temporal neural network with handcrafted features for skeleton-based action recognition
    Nan, Mihai
    Trascau, Mihai
    Florea, Adina-Magda
    [J]. NEURAL COMPUTING & APPLICATIONS, 2024, : 9221 - 9243
  • [38] Spatio-temporal deformable 3D ConvNets with attention for action recognition
    Li, Jun
    Liu, Xianglong
    Zhang, Mingyuan
    Wang, Deqing
    [J]. PATTERN RECOGNITION, 2020, 98
  • [39] Transforming spatio-temporal self-attention using action embedding for skeleton-based action recognition
    Ahmad, Tasweer
    Rizvi, Syed Tahir Hussain
    Kanwal, Neel
    [J]. JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2023, 95
  • [40] Multimodal human action recognition based on spatio-temporal action representation recognition model
    Wu, Qianhan
    Huang, Qian
    Li, Xing
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (11) : 16409 - 16430