TAM: Temporal Adaptive Module for Video Recognition

被引:164
|
作者
Liu, Zhaoyang [1 ,2 ]
Wang, Limin [1 ]
Wu, Wayne [2 ]
Qian, Chen [2 ]
Lu, Tong [1 ]
机构
[1] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing, Peoples R China
[2] SenseTime Res, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
10.1109/ICCV48922.2021.01345
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video data is with complex temporal dynamics due to various factors such as camera motion, speed variation, and different activities. To effectively capture this diverse motion pattern, this paper presents a new temporal adaptive module (TAM) to generate video-specific temporal kernels based on its own feature map. TAM proposes a unique two-level adaptive modeling scheme by decoupling the dynamic kernel into a location sensitive importance map and a location invariant aggregation weight. The importance map is learned in a local temporal window to capture short-term information, while the aggregation weight is generated from a global view with a focus on long-term structure. TAM is a modular block and could be integrated into 2D CNNs to yield a powerful video architecture (TANet) with a very small extra computational cost. The extensive experiments on Kinetics-400 and Something-Something datasets demonstrate that our TAM outperforms other temporal modeling methods consistently, and achieves the state-of-the-art performance under the similar complexity. The code is available at https://github.com/liu-zhy/temporal-adaptive-module.
引用
收藏
页码:13688 / 13698
页数:11
相关论文
共 50 条
  • [31] Adaptive Temporal Trajectory Filtering for Video Compression
    Esche, Marko
    Glantz, Alexander
    Krutz, Andreas
    Sikora, Thomas
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2012, 22 (05) : 659 - 670
  • [32] VIDEO DENOISING BASED ON ADAPTIVE TEMPORAL AVERAGING
    Bartovcak, David
    Vrankic, Miroslav
    ENGINEERING REVIEW, 2012, 32 (02) : 64 - 69
  • [33] Evolution of the Translocation and Assembly Module (TAM)
    Heinz, Eva
    Selkrig, Joel
    Belousoff, Matthew J.
    Lithgow, Trevor
    GENOME BIOLOGY AND EVOLUTION, 2015, 7 (06): : 1628 - 1643
  • [34] Beyond Temporal Pooling: Recurrence and Temporal Convolutions for Gesture Recognition in Video
    Pigou, Lionel
    van den Oord, Aaron
    Dieleman, Sander
    Van Herreweghe, Mieke
    Dambre, Joni
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2018, 126 (2-4) : 430 - 439
  • [35] Beyond Temporal Pooling: Recurrence and Temporal Convolutions for Gesture Recognition in Video
    Lionel Pigou
    Aäron van den Oord
    Sander Dieleman
    Mieke Van Herreweghe
    Joni Dambre
    International Journal of Computer Vision, 2018, 126 : 430 - 439
  • [36] WTM: Weighted Temporal Attention Module for Group Activity Recognition
    Yadav, Santosh Kumar
    Agrawal, Palaash
    Tiwari, Kamlesh
    Adeli, Ehsan
    Pandey, Hari Mohan
    Akbar, Shaik Ali
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [37] Spatio-Temporal Collaborative Module for Efficient Action Recognition
    Hao, Yanbin
    Wang, Shuo
    Tan, Yi
    He, Xiangnan
    Liu, Zhenguang
    Wang, Meng
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 7279 - 7291
  • [38] Driver fatigue detection method based on temporal-spatial adaptive networks and adaptive temporal fusion module
    Lv, Xiangshuai
    Zheng, Guoqiang
    Zhai, Huihui
    Zhou, Keke
    Zhang, Weizhen
    COMPUTERS & ELECTRICAL ENGINEERING, 2024, 119
  • [39] Temporal Shift Module with Pretrained Representations for Speech Emotion Recognition
    Shen, Siyuan
    Liu, Feng
    Wang, Hanyang
    Wang, Yunlong
    Zhou, Aimin
    INTELLIGENT COMPUTING, 2024, 3
  • [40] Temporal Extension Module for Skeleton-Based Action Recognition
    Obinata, Yuya
    Yamamoto, Takuma
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 534 - 540