Spatio-Temporal Fusion Networks for Action Recognition

被引:5
|
作者
Cho, Sangwoo [1 ]
Foroosh, Hassan [1 ]
机构
[1] Univ Cent Florida, Orlando, FL 32816 USA
来源
基金
美国国家科学基金会;
关键词
Action recognition; Spatio-temporal fusion; Temporal dynamics;
D O I
10.1007/978-3-030-20887-5_22
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The video based CNN works have focused on effective ways to fuse appearance and motion networks, but they typically lack utilizing temporal information over video frames. In this work, we present a novel spatio-temporal fusion network (STFN) that integrates temporal dynamics of appearance and motion information from entire videos. The captured temporal dynamic information is then aggregated for a better video level representation and learned via end-to-end training. The spatio-temporal fusion network consists of two set of Residual Inception blocks that extract temporal dynamics and a fusion connection for appearance and motion features. The benefits of STFN are: (a) it captures local and global temporal dynamics of complementary data to learn video-wide information; and (b) it is applicable to any network for video classification to boost performance. We explore a variety of design choices for STFN and verify how the network performance is varied with the ablation studies. We perform experiments on two challenging human activity datasets, UCF101 and HMDB51, and achieve the state-of-the-art results with the best network.
引用
收藏
页码:347 / 364
页数:18
相关论文
共 50 条
  • [1] Spatio-Temporal Attention Networks for Action Recognition and Detection
    Li, Jun
    Liu, Xianglong
    Zhang, Wenxuan
    Zhang, Mingyuan
    Song, Jingkuan
    Sebe, Nicu
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (11) : 2990 - 3001
  • [2] Action recognition with spatio-temporal augmented descriptor and fusion method
    Li, Lijun
    Dai, Shuling
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (12) : 13953 - 13969
  • [3] Spatio-Temporal Information Fusion and Filtration for Human Action Recognition
    Zhang, Man
    Li, Xing
    Wu, Qianhan
    [J]. SYMMETRY-BASEL, 2023, 15 (12):
  • [4] Unified Spatio-Temporal Attention Networks for Action Recognition in Videos
    Li, Dong
    Yao, Ting
    Duan, Ling-Yu
    Mei, Tao
    Rui, Yong
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (02) : 416 - 428
  • [5] Action Recognition in Videos with Spatio-Temporal Fusion 3D Convolutional Neural Networks
    Wang, Y.
    Shen, X. J.
    Chen, H. P.
    Sun, J. X.
    [J]. PATTERN RECOGNITION AND IMAGE ANALYSIS, 2021, 31 (03) : 580 - 587
  • [6] Action Recognition in Videos with Spatio-Temporal Fusion 3D Convolutional Neural Networks
    Y. Wang
    X. J. Shen
    H. P. Chen
    J. X. Sun
    [J]. Pattern Recognition and Image Analysis, 2021, 31 : 580 - 587
  • [7] Erratum to: Action recognition with spatio-temporal augmented descriptor and fusion method
    Lijun Li
    Shuling Dai
    [J]. Multimedia Tools and Applications, 2017, 76 : 13971 - 13971
  • [8] Spatio-temporal Multi-level Fusion for Human Action Recognition
    Manh-Hung Lu
    Thi-Oanh Nguyen
    [J]. SOICT 2019: PROCEEDINGS OF THE TENTH INTERNATIONAL SYMPOSIUM ON INFORMATION AND COMMUNICATION TECHNOLOGY, 2019, : 298 - 305
  • [9] Human Action Recognition in Video by Fusion of Structural and Spatio-temporal Features
    Borzeshi, Ehsan Zare
    Concha, Oscar Perez
    Piccardi, Massimo
    [J]. STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, 2012, 7626 : 474 - 482
  • [10] Spatio-Temporal Action Graph Networks
    Herzig, Roei
    Levi, Elad
    Xu, Huijuan
    Gao, Hang
    Brosh, Eli
    Wang, Xiaolong
    Globerson, Amir
    Darrell, Trevor
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 2347 - 2356