TAM: Temporal Adaptive Module for Video Recognition

被引:164
|
作者
Liu, Zhaoyang [1 ,2 ]
Wang, Limin [1 ]
Wu, Wayne [2 ]
Qian, Chen [2 ]
Lu, Tong [1 ]
机构
[1] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing, Peoples R China
[2] SenseTime Res, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
10.1109/ICCV48922.2021.01345
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video data is with complex temporal dynamics due to various factors such as camera motion, speed variation, and different activities. To effectively capture this diverse motion pattern, this paper presents a new temporal adaptive module (TAM) to generate video-specific temporal kernels based on its own feature map. TAM proposes a unique two-level adaptive modeling scheme by decoupling the dynamic kernel into a location sensitive importance map and a location invariant aggregation weight. The importance map is learned in a local temporal window to capture short-term information, while the aggregation weight is generated from a global view with a focus on long-term structure. TAM is a modular block and could be integrated into 2D CNNs to yield a powerful video architecture (TANet) with a very small extra computational cost. The extensive experiments on Kinetics-400 and Something-Something datasets demonstrate that our TAM outperforms other temporal modeling methods consistently, and achieves the state-of-the-art performance under the similar complexity. The code is available at https://github.com/liu-zhy/temporal-adaptive-module.
引用
收藏
页码:13688 / 13698
页数:11
相关论文
共 50 条
  • [1] TRM:Temporal Relocation Module for Video Recognition
    Qian, Yijun
    Kang, Guoliang
    Yu, Lijun
    Liu, Wenhe
    Hauptmann, Alexander G.
    2022 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WORKSHOPS (WACVW 2022), 2022, : 151 - 160
  • [2] TOWARDS TEMPORAL ADAPTIVE REPRESENTATION FOR VIDEO ACTION RECOGNITION
    Cai, Junjie
    Yu, Jie
    Imai, Francisco
    Tian, Qi
    2016 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2016, : 4155 - 4159
  • [3] Video Action Recognition Based on Spatio-temporal Feature Pyramid Module
    Gong, Suming
    Chen, Ying
    2020 13TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID 2020), 2020, : 338 - 341
  • [4] STAM: a spatio-temporal adaptive module for improving static convolutions in action recognition
    Li, Wei
    Gong, Weijun
    Qian, Yurong
    Tian, Haichen
    VISUAL COMPUTER, 2024, 40 (09): : 6279 - 6293
  • [5] Video behavior recognition based on actional-structural graph convolution and temporal extension module
    Xu, Hui
    Kong, Jun
    Liang, Mengyao
    Sun, Hui
    Qi, Miao
    ELECTRONIC RESEARCH ARCHIVE, 2022, 30 (11): : 4157 - 4177
  • [6] Deep Fusion Module for Video Action Recognition
    Li, Yunyao
    Zheng, Zihao
    Zhou, Mingliang
    Yang, Guangchao
    Wei, Xuekai
    Pu, Huayan
    Luo, Jun
    JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2024, 33 (14)
  • [7] An End to End Framework With Adaptive Spatio-Temporal Attention Module for Human Action Recognition
    Liu, Shaocan
    Ma, Xin
    Wu, Hanbo
    Li, Yibin
    IEEE ACCESS, 2020, 8 : 47220 - 47231
  • [8] Temporal Bottleneck Attention for Video Recognition
    Carvalho, Schubert R.
    Bertagnolli, Nicolas M.
    Folkman, Tyler
    20TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2021), 2021, : 1400 - 1406
  • [9] VIDEO DENOISING WITH ADAPTIVE TEMPORAL AVERAGING
    Prasath, V. B. Surya
    ENGINEERING REVIEW, 2019, 39 (03) : 243 - 247
  • [10] ADAPTIVE TEMPORAL COMPRESSIVE SENSING FOR VIDEO
    Yuan, Xin
    Yang, Jianbo
    Llull, Patrick
    Liao, Xuejun
    Sapiro, Guillermo
    Brady, David J.
    Carin, Lawrence
    2013 20TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP 2013), 2013, : 14 - 18