Action Recognition with Bootstrapping based Long-range Temporal Context Attention

被引:7
|
作者
Liu, Ziming [1 ]
Gao, Guangyu [1 ]
Qin, A. K. [2 ]
Wu, Tong [1 ]
Liu, Chi Harold [1 ]
机构
[1] Beijing Inst Technol, Beijing, Peoples R China
[2] Swinburne Univ Technol, Melbourne, Vic, Australia
基金
澳大利亚研究理事会; 中国国家自然科学基金;
关键词
Action recognition; Context; self-attention; Bootstrapping attention;
D O I
10.1145/3343031.3350916
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Actions always refer to complex vision variations in a long-range redundant video sequence. Instead of focusing on limited range sequence, i.e. convolution on adjacent frames, in this paper, we proposed an action recognition approach with bootstrapping based long-range temporal context attention. Specifically, due to vision variations of the local region across frames, we target at capturing temporal context by proposing the Temporal Pixels based Parallel-head Attention (TPPA) block. In TPPA, we apply the self-attention mechanism between local regions at the same position across temporal frames to capture the interaction impacts. Meanwhile, to deal with video redundancy and capture long-range context, the TPPA is extended to the Random Frames based Bootstrapping Attention (RFBA) framework. While the bootstrapping sampling frames have the same distribution of the whole video sequence, the RFBA not only captures longer temporal context with only a few sampling frames but also has comprehensive representation through multiple sampling. Furthermore, we also try to apply this temporal context attention to image-based action recognition, by transforming the image into "pseudo video" with the spatial shift. Finally, we conduct extensive experiments and empirical evaluations on two most popular datasets: UCF101 for videos and Stanford40 for images. In particular, our approach achieves top-1 accuracy of 91.7% in UCF101 and mAP of 90.9% in Stanford40.
引用
收藏
页码:583 / 591
页数:9
相关论文
共 50 条
  • [1] Deep video compression based on Long-range Temporal Context Learning
    Wu, Kejun
    Li, Zhenxing
    Yang, You
    Liu, Qiong
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 248
  • [2] Long-Range Hand Gesture Recognition via Attention-based SSD Network
    Zhou, Liguang
    Du, Chenping
    Sun, Zhenglong
    Lam, Tin Lun
    Xu, Yangsheng
    2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 1832 - 1838
  • [3] Modeling Long-Range Context for Concurrent Dialogue Acts Recognition
    Yu, Yue
    Peng, Siyao
    Yang, Grace Hui
    PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM '19), 2019, : 2277 - 2280
  • [4] LRTD: long-range temporal dependency based active learning for surgical workflow recognition
    Xueying Shi
    Yueming Jin
    Qi Dou
    Pheng-Ann Heng
    International Journal of Computer Assisted Radiology and Surgery, 2020, 15 : 1573 - 1584
  • [5] LRTD: long-range temporal dependency based active learning for surgical workflow recognition
    Shi, Xueying
    Jin, Yueming
    Dou, Qi
    Heng, Pheng-Ann
    INTERNATIONAL JOURNAL OF COMPUTER ASSISTED RADIOLOGY AND SURGERY, 2020, 15 (09) : 1573 - 1584
  • [6] Representing Long-Range Context for Graph Neural Networks with Global Attention
    Wu, Zhanghao
    Jain, Paras
    Wright, Matthew A.
    Mirhoseini, Azalia
    Gonzalez, Joseph E.
    Stoica, Ion
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [7] Bootstrapping the long-range sing model in three dimensions
    Behan, Connor
    JOURNAL OF PHYSICS A-MATHEMATICAL AND THEORETICAL, 2019, 52 (07)
  • [8] Enhancing long-range Automatic Target Recognition using spatial context
    Rodger, Iain
    Abbott, Rachael
    Connor, Barry
    Robertson, Neil
    2017 SENSOR SIGNAL PROCESSING FOR DEFENCE CONFERENCE (SSPD), 2017, : 227 - 232
  • [9] Voluntary control of long-range motion integration via selective attention to context
    Freeman, Elliot
    Driver, Jon
    JOURNAL OF VISION, 2008, 8 (11):
  • [10] Do Long-Range Language Models Actually Use Long-Range Context?
    Sun, Simeng
    Krishna, Kalpesh
    Mattarella-Micke, Andrew
    Iyyer, Mohit
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 807 - 822