Action Recognition with Bootstrapping based Long-range Temporal Context Attention

被引:7
|
作者
Liu, Ziming [1 ]
Gao, Guangyu [1 ]
Qin, A. K. [2 ]
Wu, Tong [1 ]
Liu, Chi Harold [1 ]
机构
[1] Beijing Inst Technol, Beijing, Peoples R China
[2] Swinburne Univ Technol, Melbourne, Vic, Australia
基金
澳大利亚研究理事会; 中国国家自然科学基金;
关键词
Action recognition; Context; self-attention; Bootstrapping attention;
D O I
10.1145/3343031.3350916
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Actions always refer to complex vision variations in a long-range redundant video sequence. Instead of focusing on limited range sequence, i.e. convolution on adjacent frames, in this paper, we proposed an action recognition approach with bootstrapping based long-range temporal context attention. Specifically, due to vision variations of the local region across frames, we target at capturing temporal context by proposing the Temporal Pixels based Parallel-head Attention (TPPA) block. In TPPA, we apply the self-attention mechanism between local regions at the same position across temporal frames to capture the interaction impacts. Meanwhile, to deal with video redundancy and capture long-range context, the TPPA is extended to the Random Frames based Bootstrapping Attention (RFBA) framework. While the bootstrapping sampling frames have the same distribution of the whole video sequence, the RFBA not only captures longer temporal context with only a few sampling frames but also has comprehensive representation through multiple sampling. Furthermore, we also try to apply this temporal context attention to image-based action recognition, by transforming the image into "pseudo video" with the spatial shift. Finally, we conduct extensive experiments and empirical evaluations on two most popular datasets: UCF101 for videos and Stanford40 for images. In particular, our approach achieves top-1 accuracy of 91.7% in UCF101 and mAP of 90.9% in Stanford40.
引用
收藏
页码:583 / 591
页数:9
相关论文
共 50 条
  • [31] Long-range attention classification for substation point cloud
    Li, Da
    Zhao, Hui
    Yan, Xingyu
    Zhao, Liang
    Cao, Hui
    NEUROCOMPUTING, 2024, 608
  • [32] Focal Attention for Long-Range Interactions in Vision Transformers
    Yang, Jianwei
    Li, Chunyuan
    Zhang, Pengchuan
    Dai, Xiyang
    Xiao, Bin
    Yuan, Lu
    Gao, Jianfeng
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [33] Supervoxel Attention Graphs for Long-Range Video Modeling
    Wang, Yang
    Bertasius, Gedas
    Oh, Tae-Hyun
    Gupta, Abhinav
    Hoai, Minh
    Torresani, Lorenzo
    2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2021), 2021, : 155 - 166
  • [34] Long-range neural coupling through synchronization with attention
    Gregoriou, Georgia G.
    Gotts, Stephen J.
    Zhou, Huihui
    Desimone, Robert
    ATTENTION, 2009, 176 : 35 - 45
  • [35] Spatio-temporal segments attention for skeleton-based action recognition
    Qiu, Helei
    Hou, Biao
    Ren, Bo
    Zhang, Xiaohua
    NEUROCOMPUTING, 2023, 518 : 30 - 38
  • [36] Attention-Based Temporal Weighted Convolutional Neural Network for Action Recognition
    Zang, Jinliang
    Wang, Le
    Liu, Ziyi
    Zhang, Qilin
    Niu, Zhenxing
    Hua, Gang
    Zheng, Nanning
    ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS, AIAI 2018, 2018, 519 : 97 - 108
  • [37] Temporal Group Deep Network Action Recognition Algorithm Based on Attention Mechanism
    Hu Z.
    Diao P.
    Zhang R.
    Li S.
    Zhao M.
    Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2019, 32 (10): : 892 - 900
  • [38] A Spatio-Temporal Motion Network for Action Recognition Based on Spatial Attention
    Yang, Qi
    Lu, Tongwei
    Zhou, Huabing
    ENTROPY, 2022, 24 (03)
  • [39] Spatio-Temporal Graph Attention Network for Sintering Temperature Long-Range Forecasting in Rotary Kilns
    Chen, Hua
    Jiang, Yu
    Zhang, Xiaogang
    Zhou, Yicong
    Wang, Lianhong
    Wei, Jinchao
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2023, 19 (02) : 1923 - 1932
  • [40] Joint spatial-temporal attention for action recognition
    Yu, Tingzhao
    Guo, Chaoxu
    Wang, Lingfeng
    Gu, Huxiang
    Xiang, Shiming
    Pan, Chunhong
    PATTERN RECOGNITION LETTERS, 2018, 112 : 226 - 233