Temporal Shift Module-Based Vision Transformer Network for Action Recognition

被引:1
|
作者
Zhang, Kunpeng [1 ]
Lyu, Mengyan [1 ]
Guo, Xinxin [1 ]
Zhang, Liye [1 ]
Liu, Cong [1 ]
机构
[1] Shandong Univ Technol, Coll Comp Sci & Technol, Zibo 255000, Peoples R China
基金
中国国家自然科学基金;
关键词
Transformers; Computational modeling; Convolutional neural networks; Computer architecture; Task analysis; Image segmentation; Head; Action recognition; self-attention; temporal shift module; vision transformer;
D O I
10.1109/ACCESS.2024.3379885
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper introduces a novel action recognition model named ViT-Shift, which combines the Time Shift Module (TSM) with the Vision Transformer (ViT) architecture. Traditional video action recognition tasks face significant computational challenges, requiring substantial computing resources. However, our model successfully addresses this issue by incorporating the TSM, achieving outstanding performance while significantly reducing computational costs. Our approach is based on the latest Transformer self-attention mechanism, applied to video sequence processing instead of traditional convolutional methods. To preserve the core architecture of ViT and transfer its excellent performance in image recognition to video action recognition, we strategically introduce the TSM only before the multi-head attention layer of ViT. This design allows us to simulate temporal interactions using channel shifts, effectively reducing computational complexity. We carefully design the position and shift parameters of the TSM to maximize the model's performance. Experimental results demonstrate that ViT-Shift achieves remarkable results on two standard action recognition datasets. With ImageNet-21K pretraining, we achieve an accuracy of 77.55% on the Kinetics-400 dataset and 93.07% on the UCF-101 dataset.
引用
收藏
页码:47246 / 47257
页数:12
相关论文
共 50 条
  • [1] Temporal Shift Vision Transformer Adapter for Efficient Video Action Recognition
    Shi, Yaning
    Sun, Pu
    Gu, Bing
    Li, Longfei
    PROCEEDINGS OF 2024 4TH INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND INTELLIGENT COMPUTING, BIC 2024, 2024, : 42 - 46
  • [2] Attention module-based spatial-temporal graph convolutional networks for skeleton-based action recognition
    Kong, Yinghui
    Li, Li
    Zhang, Ke
    Ni, Qiang
    Han, Jungong
    JOURNAL OF ELECTRONIC IMAGING, 2019, 28 (04)
  • [3] Mechanism of action of Salvianolic Acid B by module-based network analysis
    Ren, Zhenzhen
    Wang, Xing
    Wang, Shifeng
    Zhai, Chenxi
    He, Yusu
    Zhang, Yanling
    Qiao, Yanjiang
    BIO-MEDICAL MATERIALS AND ENGINEERING, 2014, 24 (01) : 1333 - 1340
  • [4] D-TSM: Discriminative Temporal Shift Module for Action Recognition
    Lee, Sangyun
    Hong, Sungjun
    2023 20TH INTERNATIONAL CONFERENCE ON UBIQUITOUS ROBOTS, UR, 2023, : 133 - 136
  • [5] STSM: Spatio-Temporal Shift Module for Efficient Action Recognition
    Yang, Zhaoqilin
    An, Gaoyun
    Zhang, Ruichen
    MATHEMATICS, 2022, 10 (18)
  • [6] Deformable patch embedding-based shift module-enhanced transformer for panoramic action recognition
    Zhang, Xiaoyan
    Cui, Yujie
    Huo, Yongkai
    VISUAL COMPUTER, 2023, 39 (08): : 3247 - 3257
  • [7] Deformable patch embedding-based shift module-enhanced transformer for panoramic action recognition
    Xiaoyan Zhang
    Yujie Cui
    Yongkai Huo
    The Visual Computer, 2023, 39 : 3247 - 3257
  • [8] Graph transformer network with temporal kernel attention for skeleton-based action recognition
    Liu, Yanan
    Zhang, Hao
    Xu, Dan
    He, Kangjian
    KNOWLEDGE-BASED SYSTEMS, 2022, 240
  • [9] Graph transformer network with temporal kernel attention for skeleton-based action recognition
    Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming
    650504, China
    Knowl Based Syst,
  • [10] Temporal Extension Module for Skeleton-Based Action Recognition
    Obinata, Yuya
    Yamamoto, Takuma
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 534 - 540