Temporal Bilinear Networks for Video Action Recognition

被引:0
|
作者
Li, Yanghao [1 ]
Song, Sijie [1 ]
Li, Yuqi [1 ]
Liu, Jiaying [1 ]
机构
[1] Peking Univ, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Temporal modeling in videos is a fundamental yet challenging problem in computer vision. In this paper, we propose a novel Temporal Bilinear (TB) model to capture the temporal pairwise feature interactions between adjacent frames. Compared with some existing temporal methods which are limited in linear transformations, our TB model considers explicit quadratic bilinear transformations in the temporal domain for motion evolution and sequential relation modeling. We further leverage the factorized bilinear model in linear complexity and a bottleneck network design to build our TB blocks, which also constrains the parameters and computation cost. We consider two schemes in terms of the incorporation of TB blocks and the original 2D spatial convolutions, namely wide and deep Temporal Bilinear Networks (TBN). Finally, we perform experiments on several widely adopted datasets including Kinetics, UCF101 and HMDB51. The effectiveness of our TBNs is validated by comprehensive ablation analyses and comparisons with various state-of-the-art methods.
引用
收藏
页码:8674 / 8681
页数:8
相关论文
共 50 条
  • [31] Spatial-Temporal Neural Networks for Action Recognition
    Jing, Chao
    Wei, Ping
    Sun, Hongbin
    Zheng, Nanning
    ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS, AIAI 2018, 2018, 519 : 619 - 627
  • [32] TDN: Temporal Difference Networks for Efficient Action Recognition
    Wang, Limin
    Tong, Zhan
    Ji, Bin
    Wu, Gangshan
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 1895 - 1904
  • [33] Spatio-Temporal Fusion Networks for Action Recognition
    Cho, Sangwoo
    Foroosh, Hassan
    COMPUTER VISION - ACCV 2018, PT I, 2019, 11361 : 347 - 364
  • [34] Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition
    Wasim, Syed Talal
    Khattak, Muhammad Uzair
    Naseer, Muzammal
    Khan, Salman
    Shah, Mubarak
    Khan, Fahad Shahbaz
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 13732 - 13743
  • [35] Temporal-attentive Covariance Pooling Networks for Video Recognition
    Gao, Zilin
    Wang, Qilong
    Zhang, Bingbing
    Hu, Qinghua
    Li, Peihua
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,
  • [36] Efficient dual attention SlowFast networks for video action recognition
    Wei, Dafeng
    Tian, Ye
    Wei, Liqing
    Zhong, Hong
    Chen, Siqian
    Pu, Shiliang
    Lu, Hongtao
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2022, 222
  • [37] Dense Semantics-Assisted Networks for Video Action Recognition
    Luo, Haonan
    Lin, Guosheng
    Yao, Yazhou
    Tang, Zhenmin
    Wu, Qingyao
    Hua, Xiansheng
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (05) : 3073 - 3084
  • [38] CHANNEL-WISE TEMPORAL ATTENTION NETWORK FOR VIDEO ACTION RECOGNITION
    Lei, Jianjun
    Jia, Yalong
    Peng, Bo
    Huang, Qingming
    2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, : 562 - 567
  • [39] AZTR: Aerial Video Action Recognition with Auto Zoom and Temporal Reasoning
    Wang, Xijun
    Xian, Ruiqi
    Guan, Tianrui
    de Melo, Celso M.
    Nogar, Stephen M.
    Bera, Aniket
    Manocha, Dinesh
    2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA, 2023, : 1312 - 1319
  • [40] Temporal Shift Vision Transformer Adapter for Efficient Video Action Recognition
    Shi, Yaning
    Sun, Pu
    Gu, Bing
    Li, Longfei
    PROCEEDINGS OF 2024 4TH INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND INTELLIGENT COMPUTING, BIC 2024, 2024, : 42 - 46