Temporal Bilinear Networks for Video Action Recognition

被引:0
|
作者
Li, Yanghao [1 ]
Song, Sijie [1 ]
Li, Yuqi [1 ]
Liu, Jiaying [1 ]
机构
[1] Peking Univ, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Temporal modeling in videos is a fundamental yet challenging problem in computer vision. In this paper, we propose a novel Temporal Bilinear (TB) model to capture the temporal pairwise feature interactions between adjacent frames. Compared with some existing temporal methods which are limited in linear transformations, our TB model considers explicit quadratic bilinear transformations in the temporal domain for motion evolution and sequential relation modeling. We further leverage the factorized bilinear model in linear complexity and a bottleneck network design to build our TB blocks, which also constrains the parameters and computation cost. We consider two schemes in terms of the incorporation of TB blocks and the original 2D spatial convolutions, namely wide and deep Temporal Bilinear Networks (TBN). Finally, we perform experiments on several widely adopted datasets including Kinetics, UCF101 and HMDB51. The effectiveness of our TBNs is validated by comprehensive ablation analyses and comparisons with various state-of-the-art methods.
引用
收藏
页码:8674 / 8681
页数:8
相关论文
共 50 条
  • [21] Gate-Shift Networks for Video Action Recognition
    Sudhakaran, Swathikiran
    Escalera, Sergio
    Lanz, Oswald
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 1099 - 1108
  • [22] Cross-Fiber Spatial-Temporal Co-enhanced Networks for Video Action Recognition
    Wu, Haoze
    Zha, Zheng-Jun
    Wen, Xin
    Chen, Zhenzhong
    Liu, Dong
    Chen, Xuejin
    PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 620 - 628
  • [23] EmotiEffNet and Temporal Convolutional Networks in Video-based Facial Expression Recognition and Action Unit Detection
    Savchenko, Andrey, V
    Sidorova, Anna P.
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW, 2024, : 4849 - 4859
  • [24] Spatio-temporal Video Autoencoder for Human Action Recognition
    Sousa e Santos, Anderson Carlos
    Pedrini, Helio
    PROCEEDINGS OF THE 14TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS (VISAPP), VOL 5, 2019, : 114 - 123
  • [25] Alignment-guided Temporal Attention for Video Action Recognition
    Zhao, Yizhou
    Li, Zhenyang
    Guo, Xun
    Lu, Yan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [26] Spatial-Temporal Separable Attention for Video Action Recognition
    Guo, Xi
    Hu, Yikun
    Chen, Fang
    Jin, Yuhui
    Qiao, Jian
    Huang, Jian
    Yang, Qin
    2022 INTERNATIONAL CONFERENCE ON FRONTIERS OF ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING, FAIML, 2022, : 224 - 228
  • [27] Video Based Action Recognition using Spatial and Temporal Feature
    Dai, Cheng
    Liu, Xingang
    Zhong, Luhao
    Yu, Tao
    IEEE 2018 INTERNATIONAL CONGRESS ON CYBERMATICS / 2018 IEEE CONFERENCES ON INTERNET OF THINGS, GREEN COMPUTING AND COMMUNICATIONS, CYBER, PHYSICAL AND SOCIAL COMPUTING, SMART DATA, BLOCKCHAIN, COMPUTER AND INFORMATION TECHNOLOGY, 2018, : 635 - 638
  • [28] Interpretable Spatio-temporal Attention for Video Action Recognition
    Meng, Lili
    Zhao, Bo
    Chang, Bo
    Huang, Gao
    Sun, Wei
    Tung, Frederich
    Sigal, Leonid
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 1513 - 1522
  • [29] Exploiting spatio-temporal knowledge for video action recognition
    Zhang, Huigang
    Wang, Liuan
    Sun, Jun
    IET COMPUTER VISION, 2023, 17 (02) : 222 - 230
  • [30] ATTENTIONAL FUSED TEMPORAL TRANSFORMATION NETWORK FOR VIDEO ACTION RECOGNITION
    Yang, Ke
    Wang, Zhiyuan
    Dai, Huadong
    Shen, Tianlong
    Qiao, Peng
    Niu, Xin
    Jiang, Jie
    Li, Dongsheng
    Dou, Yong
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 4377 - 4381