TAM: Temporal Adaptive Module for Video Recognition

被引:164
|
作者
Liu, Zhaoyang [1 ,2 ]
Wang, Limin [1 ]
Wu, Wayne [2 ]
Qian, Chen [2 ]
Lu, Tong [1 ]
机构
[1] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing, Peoples R China
[2] SenseTime Res, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
10.1109/ICCV48922.2021.01345
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video data is with complex temporal dynamics due to various factors such as camera motion, speed variation, and different activities. To effectively capture this diverse motion pattern, this paper presents a new temporal adaptive module (TAM) to generate video-specific temporal kernels based on its own feature map. TAM proposes a unique two-level adaptive modeling scheme by decoupling the dynamic kernel into a location sensitive importance map and a location invariant aggregation weight. The importance map is learned in a local temporal window to capture short-term information, while the aggregation weight is generated from a global view with a focus on long-term structure. TAM is a modular block and could be integrated into 2D CNNs to yield a powerful video architecture (TANet) with a very small extra computational cost. The extensive experiments on Kinetics-400 and Something-Something datasets demonstrate that our TAM outperforms other temporal modeling methods consistently, and achieves the state-of-the-art performance under the similar complexity. The code is available at https://github.com/liu-zhy/temporal-adaptive-module.
引用
收藏
页码:13688 / 13698
页数:11
相关论文
共 50 条
  • [11] TSM: Temporal Shift Module for Efficient Video Understanding
    Lin, Ji
    Gan, Chuang
    Han, Song
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 7082 - 7092
  • [12] Adaptive Focus for Efficient Video Recognition
    Wang, Yulin
    Chen, Zhaoxi
    Jiang, Haojun
    Song, Shiji
    Han, Yizeng
    Huang, Gao
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 16229 - 16238
  • [13] Spatio-temporal adaptive convolution and bidirectional motion difference fusion for video action recognition
    Li, Linxi
    Tang, Mingwei
    Yang, Zhendong
    Hu, Jie
    Zhao, Mingfeng
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 255
  • [14] GCM: Efficient video recognition with glance and combine module
    Zhou, Yichen
    Huang, Ziyuan
    Yang, Xulei
    Ang, Marcelo
    Ng, Teck Khim
    PATTERN RECOGNITION, 2023, 133
  • [15] TEMPORAL ANALYSIS OF ADAPTIVE FACE RECOGNITION
    Akhtar, Zahid
    Rattani, Ajita
    Foresti, Gian Luca
    JOURNAL OF ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING RESEARCH, 2014, 4 (04) : 243 - 255
  • [16] GROUPED TEMPORAL ENHANCEMENT MODULE FOR HUMAN ACTION RECOGNITION
    Liu, Hong
    Ren, Bin
    Liu, Mengyuan
    Ding, Runwei
    2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 1801 - 1805
  • [17] Spatial-temporal interaction module for action recognition
    Luo, Hui-Lan
    Chen, Han
    Cheung, Yiu-Ming
    Yu, Yawei
    JOURNAL OF ELECTRONIC IMAGING, 2022, 31 (04)
  • [18] Leveraging Temporal Contextualization for Video Action Recognition
    Kim, Minji
    Han, Dongyoon
    Kim, Taekyung
    Han, Bohyung
    COMPUTER VISION - ECCV 2024, PT XXI, 2025, 15079 : 74 - 91
  • [19] Temporal ROI Align for Video Object Recognition
    Gong, Tao
    Chen, Kai
    Wang, Xinjiang
    Chu, Qi
    Zhu, Feng
    Lin, Dahua
    Yu, Nenghai
    Feng, Huamin
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 1442 - 1450
  • [20] Temporal Difference Networks for Video Action Recognition
    Ng, Joe Yue-Hei
    Davis, Larry S.
    2018 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2018), 2018, : 1577 - 1586