Mixed Attention and Channel Shift Transformer for Efficient Action Recognition

被引:0
|
作者
Lu, Xiusheng [1 ]
Hao, Yanbin [2 ]
Cheng, Lechao [3 ]
Zhao, Sicheng [1 ]
Li, Yutao [4 ]
Song, Mingli [5 ]
机构
[1] Tsinghua Univ, Beijing, Peoples R China
[2] Univ Sci & Technol China, Hefei, Peoples R China
[3] Hefei Univ Technol, Hefei, Peoples R China
[4] Ocean Univ China, Qingdao, Peoples R China
[5] Zhejiang Univ, Hangzhou, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Action recognition; mixed attention; random attention; channel shift;
D O I
10.1145/3712594
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The practical use of the Transformer-based methods for processing videos is constrained by the high computing complexity. Although previous approaches adopt the spatiotemporal decomposition of 3D attention to mitigate the issue, they suffer from the drawback of neglecting the majority of visual tokens. This article presents a novel mixed attention operation that subtly fuses the random, spatial, and temporal attention mechanisms. The proposed random attention stochastically samples video tokens in a simple yet effective way, complementing other attention methods. Furthermore, since the attention operation concentrates on learning long-distance relationships, we employ the channel shift operation to encode short-term temporal characteristics. Our model can provide more comprehensive motion representations thanks to the amalgamation of these techniques. Experimental results show that the proposed method produces competitive action recognition results with low computational overhead on both large-scale and small-scale public video datasets.
引用
收藏
页数:20
相关论文
共 50 条
  • [1] Temporal Shift Vision Transformer Adapter for Efficient Video Action Recognition
    Shi, Yaning
    Sun, Pu
    Gu, Bing
    Li, Longfei
    PROCEEDINGS OF 2024 4TH INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND INTELLIGENT COMPUTING, BIC 2024, 2024, : 42 - 46
  • [2] Efficient Continuous Sign Language Recognition with Temporal Shift and Channel Attention
    Nam, Nguyen Tu
    Takahashi, Hiroki
    HYBRID ARTIFICIAL INTELLIGENT SYSTEM, PT I, HAIS 2024, 2025, 14857 : 301 - 311
  • [3] DirecFormer: A Directed Attention in Transformer Approach to Robust Action Recognition
    Thanh-Dat Truong
    Quoc-Huy Bui
    Chi Nhan Duong
    Seo, Han-Seok
    Son Lam Phung
    Li, Xin
    Khoa Luu
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 19998 - 20008
  • [4] Temporal Shift and Attention Modules for Graphical Skeleton Action Recognition
    Zhu, Haidong
    Zheng, Zhaoheng
    Nevatia, Ram
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 3145 - 3151
  • [5] Human behavior recognition based on sparse transformer with channel attention mechanism
    Cao, Keyan
    Wang, Mingrui
    FRONTIERS IN PHYSIOLOGY, 2023, 14
  • [6] Differential motion attention network for efficient action recognition
    Liu, Caifeng
    Gu, Fangjie
    VISUAL COMPUTER, 2025, 41 (03): : 1719 - 1731
  • [7] STAR-Transformer: A Spatio-temporal Cross Attention Transformer for Human Action Recognition
    Ahn, Dasom
    Kim, Sangwon
    Hong, Hyunsu
    Ko, Byoung Chul
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 3319 - 3328
  • [8] Efficient convolutional dual-attention transformer for automatic modulation recognition
    Yi, Zengrui
    Meng, Hua
    Gao, Lu
    He, Zhonghang
    Yang, Meng
    APPLIED INTELLIGENCE, 2025, 55 (03)
  • [9] SCA Net: Sparse Channel Attention Module for Action Recognition
    Song, Hang
    Song, YongHong
    Zhang, YuanLin
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 1189 - 1196
  • [10] Temporal Shift Module-Based Vision Transformer Network for Action Recognition
    Zhang, Kunpeng
    Lyu, Mengyan
    Guo, Xinxin
    Zhang, Liye
    Liu, Cong
    IEEE ACCESS, 2024, 12 : 47246 - 47257