Mixed Attention and Channel Shift Transformer for Efficient Action Recognition

被引:0
|
作者
Lu, Xiusheng [1 ]
Hao, Yanbin [2 ]
Cheng, Lechao [3 ]
Zhao, Sicheng [1 ]
Li, Yutao [4 ]
Song, Mingli [5 ]
机构
[1] Tsinghua Univ, Beijing, Peoples R China
[2] Univ Sci & Technol China, Hefei, Peoples R China
[3] Hefei Univ Technol, Hefei, Peoples R China
[4] Ocean Univ China, Qingdao, Peoples R China
[5] Zhejiang Univ, Hangzhou, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Action recognition; mixed attention; random attention; channel shift;
D O I
10.1145/3712594
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The practical use of the Transformer-based methods for processing videos is constrained by the high computing complexity. Although previous approaches adopt the spatiotemporal decomposition of 3D attention to mitigate the issue, they suffer from the drawback of neglecting the majority of visual tokens. This article presents a novel mixed attention operation that subtly fuses the random, spatial, and temporal attention mechanisms. The proposed random attention stochastically samples video tokens in a simple yet effective way, complementing other attention methods. Furthermore, since the attention operation concentrates on learning long-distance relationships, we employ the channel shift operation to encode short-term temporal characteristics. Our model can provide more comprehensive motion representations thanks to the amalgamation of these techniques. Experimental results show that the proposed method produces competitive action recognition results with low computational overhead on both large-scale and small-scale public video datasets.
引用
收藏
页数:20
相关论文
共 50 条
  • [21] Tennis Action Recognition Based on Multi-Branch Mixed Attention
    Zhou, Xianwei
    Chen, Weitao
    Li, Zhenfeng
    Li, Yuan
    Lei, Jiale
    Yu, Songsen
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT II, KSEM 2023, 2023, 14118 : 162 - 175
  • [22] MSAHTA: Mixed Spatial Attention and Hierarchical Temporal Aggregation for Action Recognition
    Feng, Jinyuan
    Yang, Dan
    Ge, Yongxin
    Qin, Xiaolei
    Chen, Yida
    Wang, Yuangan
    2019 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING & COMMUNICATIONS, CLOUD & BIG DATA COMPUTING, INTERNET OF PEOPLE AND SMART CITY INNOVATION (SMARTWORLD/SCALCOM/UIC/ATC/CBDCOM/IOP/SCI 2019), 2019, : 775 - 782
  • [23] STSM: Spatio-Temporal Shift Module for Efficient Action Recognition
    Yang, Zhaoqilin
    An, Gaoyun
    Zhang, Ruichen
    MATHEMATICS, 2022, 10 (18)
  • [24] k-NN attention-based video vision transformer for action recognition
    Sun, Weirong
    Ma, Yujun
    Wang, Ruili
    NEUROCOMPUTING, 2024, 574
  • [25] Supervised Spatial Transformer Networks for Attention Learning in Fine-grained Action Recognition
    Liu, Dichao
    Wang, Yu
    Kato, Jien
    VISAPP: PROCEEDINGS OF THE 14TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS, VOL 4, 2019, : 311 - 318
  • [26] An Effective Video Transformer With Synchronized Spatiotemporal and Spatial Self-Attention for Action Recognition
    Alfasly, Saghir
    Chui, Charles K.
    Jiang, Qingtang
    Lu, Jian
    Xu, Chen
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (02) : 2496 - 2509
  • [27] Graph transformer network with temporal kernel attention for skeleton-based action recognition
    Liu, Yanan
    Zhang, Hao
    Xu, Dan
    He, Kangjian
    KNOWLEDGE-BASED SYSTEMS, 2022, 240
  • [28] Graph transformer network with temporal kernel attention for skeleton-based action recognition
    Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming
    650504, China
    Knowl Based Syst,
  • [29] MixTConv: Mixed Temporal Convolutional Kernels for Efficient Action Recognition
    Shan, Kaiyu
    Wang, Yongtao
    Tang, Zhi
    Chen, Ying
    Li, Yangyan
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 1751 - 1756
  • [30] EstraNet: An Efficient Shift-Invariant Transformer Network for Side-Channel Analysis
    Hajra S.
    Chowdhury S.
    Mukhopadhyay D.
    IACR Transactions on Cryptographic Hardware and Embedded Systems, 2023, 2024 (01): : 336 - 374