A Spatio-Temporal Motion Network for Action Recognition Based on Spatial Attention

被引:8
|
作者
Yang, Qi [1 ,2 ]
Lu, Tongwei [1 ,2 ]
Zhou, Huabing [1 ,2 ]
机构
[1] Wuhan Inst Technol, Sch Comp Sci & Engn, Wuhan 430205, Peoples R China
[2] Wuhan Inst Technol, Hubei Key Lab Intelligent Robot, Wuhan 430205, Peoples R China
基金
中国国家自然科学基金;
关键词
temporal modeling; spatio-temporal motion; group convolution; spatial attention;
D O I
10.3390/e24030368
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
Temporal modeling is the key for action recognition in videos, but traditional 2D CNNs do not capture temporal relationships well. 3D CNNs can achieve good performance, but are computationally intensive and not well practiced on existing devices. Based on these problems, we design a generic and effective module called spatio-temporal motion network (SMNet). SMNet maintains the complexity of 2D and reduces the computational effort of the algorithm while achieving performance comparable to 3D CNNs. SMNet contains a spatio-temporal excitation module (SE) and a motion excitation module (ME). The SE module uses group convolution to fuse temporal information to reduce the number of parameters in the network, and uses spatial attention to extract spatial information. The ME module uses the difference between adjacent frames to extract feature-level motion patterns between adjacent frames, which can effectively encode motion features and help identify actions efficiently. We use ResNet-50 as the backbone network and insert SMNet into the residual blocks to form a simple and effective action network. The experiment results on three datasets, namely Something-Something V1, Something-Something V2, and Kinetics-400, show that it out performs state-of-the-arts motion recognition networks.
引用
收藏
页数:19
相关论文
共 50 条
  • [1] SPATIO-TEMPORAL SLOWFAST SELF-ATTENTION NETWORK FOR ACTION RECOGNITION
    Kim, Myeongjun
    Kim, Taehun
    Kim, Daijin
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 2206 - 2210
  • [2] MEST: An Action Recognition Network with Motion Encoder and Spatio-Temporal Module
    Zhang, Yi
    [J]. SENSORS, 2022, 22 (17)
  • [3] Efficient spatio-temporal network for action recognition
    Su, Yanxiong
    Zhao, Qian
    [J]. JOURNAL OF REAL-TIME IMAGE PROCESSING, 2024, 21 (05)
  • [4] Spatio-temporal segments attention for skeleton-based action recognition
    Qiu, Helei
    Hou, Biao
    Ren, Bo
    Zhang, Xiaohua
    [J]. NEUROCOMPUTING, 2023, 518 : 30 - 38
  • [5] Resstanet: deep residual spatio-temporal attention network for violent action recognition
    Ajeet Pandey
    Piyush Kumar
    [J]. International Journal of Information Technology, 2024, 16 (5) : 2891 - 2900
  • [6] Interpretable Spatio-temporal Attention for Video Action Recognition
    Meng, Lili
    Zhao, Bo
    Chang, Bo
    Huang, Gao
    Sun, Wei
    Tung, Frederich
    Sigal, Leonid
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 1513 - 1522
  • [7] Spatio-Temporal Attention Networks for Action Recognition and Detection
    Li, Jun
    Liu, Xianglong
    Zhang, Wenxuan
    Zhang, Mingyuan
    Song, Jingkuan
    Sebe, Nicu
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (11) : 2990 - 3001
  • [8] ACTION RECOGNITION USING SPATIO-TEMPORAL DIFFERENTIAL MOTION
    Yadav, Gaurav Kumar
    Sethi, Amit
    [J]. 2017 24TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2017, : 3415 - 3419
  • [9] Facial Expression Recognition Based on Deep Spatio-Temporal Attention Network
    Li, Shuqin
    Zheng, Xiangwei
    Zhang, Xia
    Chen, Xuanchi
    Li, Wei
    [J]. COLLABORATIVE COMPUTING: NETWORKING, APPLICATIONS AND WORKSHARING, COLLABORATECOM 2022, PT II, 2022, 461 : 516 - 532
  • [10] Dual Stream Spatio-Temporal Motion Fusion With Self-Attention For Action Recognition
    Jalal, Md Asif
    Aftab, Waqas
    Moore, Roger K.
    Mihaylova, Lyudmila
    [J]. 2019 22ND INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION 2019), 2019,