Temporal Bilinear Networks for Video Action Recognition

被引:0
|
作者
Li, Yanghao [1 ]
Song, Sijie [1 ]
Li, Yuqi [1 ]
Liu, Jiaying [1 ]
机构
[1] Peking Univ, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Temporal modeling in videos is a fundamental yet challenging problem in computer vision. In this paper, we propose a novel Temporal Bilinear (TB) model to capture the temporal pairwise feature interactions between adjacent frames. Compared with some existing temporal methods which are limited in linear transformations, our TB model considers explicit quadratic bilinear transformations in the temporal domain for motion evolution and sequential relation modeling. We further leverage the factorized bilinear model in linear complexity and a bottleneck network design to build our TB blocks, which also constrains the parameters and computation cost. We consider two schemes in terms of the incorporation of TB blocks and the original 2D spatial convolutions, namely wide and deep Temporal Bilinear Networks (TBN). Finally, we perform experiments on several widely adopted datasets including Kinetics, UCF101 and HMDB51. The effectiveness of our TBNs is validated by comprehensive ablation analyses and comparisons with various state-of-the-art methods.
引用
收藏
页码:8674 / 8681
页数:8
相关论文
共 50 条
  • [1] Temporal Difference Networks for Video Action Recognition
    Ng, Joe Yue-Hei
    Davis, Larry S.
    2018 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2018), 2018, : 1577 - 1586
  • [2] Dynamic Representation Learning for Video Action Recognition Using Temporal Residual Networks
    Kong, Yongqiang
    Huang, Jianhui
    Huang, Shanshan
    Wei, Zhengang
    Wang, Shengke
    2018 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING & COMMUNICATIONS, CLOUD & BIG DATA COMPUTING, INTERNET OF PEOPLE AND SMART CITY INNOVATION (SMARTWORLD/SCALCOM/UIC/ATC/CBDCOM/IOP/SCI), 2018, : 331 - 337
  • [3] Leveraging Temporal Contextualization for Video Action Recognition
    Kim, Minji
    Han, Dongyoon
    Kim, Taekyung
    Han, Bohyung
    COMPUTER VISION - ECCV 2024, PT XXI, 2025, 15079 : 74 - 91
  • [4] Temporal Contrastive Pretraining for Video Action Recognition
    Lorre, Guillaume
    Rabarisoa, Jaonary
    Orcesi, Astrid
    Ainouz, Samia
    Canu, Stephane
    2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 651 - 659
  • [5] Video Action Recognition by Combining Spatial-Temporal Cues with Graph Convolutional Networks
    Li, Tao
    Xiong, Wenjun
    Zhang, Zheng
    Pei, Lishen
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2023,
  • [6] Video Action Recognition by Combining Spatial-Temporal Cues with Graph Convolutional Networks
    Li, Tao
    Xiong, Wenjun
    Zhang, Zheng
    Pei, Lishen
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2023,
  • [7] Spatiotemporal Fusion Networks for Video Action Recognition
    Liu, Zheng
    Hu, Haifeng
    Zhang, Junxuan
    NEURAL PROCESSING LETTERS, 2019, 50 (02) : 1877 - 1890
  • [8] Spatiotemporal Residual Networks for Video Action Recognition
    Feichtenhofer, Christoph
    Pinz, Axel
    Wildes, Richard P.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
  • [9] Spatiotemporal Fusion Networks for Video Action Recognition
    Zheng Liu
    Haifeng Hu
    Junxuan Zhang
    Neural Processing Letters, 2019, 50 : 1877 - 1890
  • [10] Spatiotemporal Relation Networks for Video Action Recognition
    Liu, Zheng
    Hu, Haifeng
    IEEE ACCESS, 2019, 7 : 14969 - 14976