Action Recognition with Bootstrapping based Long-range Temporal Context Attention

被引:7
|
作者
Liu, Ziming [1 ]
Gao, Guangyu [1 ]
Qin, A. K. [2 ]
Wu, Tong [1 ]
Liu, Chi Harold [1 ]
机构
[1] Beijing Inst Technol, Beijing, Peoples R China
[2] Swinburne Univ Technol, Melbourne, Vic, Australia
基金
澳大利亚研究理事会; 中国国家自然科学基金;
关键词
Action recognition; Context; self-attention; Bootstrapping attention;
D O I
10.1145/3343031.3350916
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Actions always refer to complex vision variations in a long-range redundant video sequence. Instead of focusing on limited range sequence, i.e. convolution on adjacent frames, in this paper, we proposed an action recognition approach with bootstrapping based long-range temporal context attention. Specifically, due to vision variations of the local region across frames, we target at capturing temporal context by proposing the Temporal Pixels based Parallel-head Attention (TPPA) block. In TPPA, we apply the self-attention mechanism between local regions at the same position across temporal frames to capture the interaction impacts. Meanwhile, to deal with video redundancy and capture long-range context, the TPPA is extended to the Random Frames based Bootstrapping Attention (RFBA) framework. While the bootstrapping sampling frames have the same distribution of the whole video sequence, the RFBA not only captures longer temporal context with only a few sampling frames but also has comprehensive representation through multiple sampling. Furthermore, we also try to apply this temporal context attention to image-based action recognition, by transforming the image into "pseudo video" with the spatial shift. Finally, we conduct extensive experiments and empirical evaluations on two most popular datasets: UCF101 for videos and Stanford40 for images. In particular, our approach achieves top-1 accuracy of 91.7% in UCF101 and mAP of 90.9% in Stanford40.
引用
收藏
页码:583 / 591
页数:9
相关论文
共 50 条
  • [41] Joint spatial-temporal attention for action recognition
    Yu, Tingzhao
    Guo, Chaoxu
    Wang, Lingfeng
    Gu, Huxiang
    Xiang, Shiming
    Pan, Chunhong
    PATTERN RECOGNITION LETTERS, 2018, 112 : 226 - 233
  • [42] Shrinking Temporal Attention in Transformers for Video Action Recognition
    Li, Bonan
    Xiong, Pengfei
    Han, Congying
    Guo, Tiande
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 1263 - 1271
  • [43] MULTIVARIATE LIMIT THEOREMS IN THE CONTEXT OF LONG-RANGE DEPENDENCE
    Bai, Shuyang
    Taqqu, Murad S.
    JOURNAL OF TIME SERIES ANALYSIS, 2013, 34 (06) : 717 - 743
  • [45] Explore Long-Range Context Features for Speaker Verification
    Li, Zhuo
    Zhao, Zhenduo
    Wang, Wenchao
    Zhang, Pengyuan
    Zhao, Qingwei
    APPLIED SCIENCES-BASEL, 2023, 13 (03):
  • [46] Long-range gene regulation in the context of chromatin domains
    Dekker, Job
    FASEB JOURNAL, 2016, 30
  • [47] Ordinal Pattern Dependence in the Context of Long-Range Dependence
    Nuessgen, Ines
    Schnurr, Alexander
    ENTROPY, 2021, 23 (06)
  • [48] Limit theorems in the context of multivariate long-range dependence
    Dueker, Marie-Christine
    STOCHASTIC PROCESSES AND THEIR APPLICATIONS, 2020, 130 (09) : 5394 - 5425
  • [49] A knowledge-based recognition algorithm for long-range infrared bridge images
    Cao, ZG
    Sun, Q
    Zhang, TX
    AUTOMATIC TARGET RECOGNITION XI, 2001, 4379 : 168 - 175
  • [50] Theory of critical phenomena with long-range temporal interaction
    Zeng, Shaolong
    Zhong, Fan
    PHYSICA SCRIPTA, 2023, 98 (07)