AGPN: Action Granularity Pyramid Network for Video Action Recognition

被引:21
|
作者
Chen, Yatong [1 ]
Ge, Hongwei [1 ]
Liu, Yuxuan [1 ]
Cai, Xinye [1 ]
Sun, Liang [1 ]
机构
[1] Dalian Univ Technol, Sch Comp Sci & Technol, Dalian 116024, Peoples R China
基金
中国国家自然科学基金;
关键词
Video action recognition; pyramid network; multi-scale; multi-granularity; REPRESENTATIONS;
D O I
10.1109/TCSVT.2023.3235522
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Video action recognition is a fundamental task for video understanding. Action recognition in complex spatio-temporal contexts generally requires fusing of different multi-granularity action information. However, existing works do not consider spatio-temporal information modeling and fusion from the perspective of action granularity. To address this problem, this paper proposes an Action Granularity Pyramid Network (AGPN) for action recognition, which can be flexibly integrated into 2D backbone networks. The core module is the Action Granularity Pyramid Module (AGPM), a hierarchical pyramid structure with residual connections, which is established to fuse multi-granularity action spatio-temporal information. From top to bottom level in the designed pyramid structure, the receptive field decreases and action granularity becomes more refined. To enrich temporal information of the inputs, a Multiple Frame Rate Module (MFM) is proposed to mix different frame rates at a fine-grained pixel-wise level. Moreover, a Spatio-temporal Anchor Module (SAM) is employed to fix spatio-temporal feature anchors to promote the effectiveness of feature extraction. We conduct extensive experiments on three large-scale action recognition datasets, Something-Something V1 & V2 and Kinetics-400. The results demonstrate that our proposed AGPN outperforms the state-of-the-art methods for the tasks of video action recognition.
引用
收藏
页码:3912 / 3923
页数:12
相关论文
共 50 条
  • [31] Action recognition on continuous video
    Y. L. Chang
    C. S. Chan
    P. Remagnino
    Neural Computing and Applications, 2021, 33 : 1233 - 1243
  • [32] Video Action Retrieval Using Action Recognition Model
    Iinuma, Yuko
    Satoh, Shin'ichi
    PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR '21), 2021, : 603 - 606
  • [33] Action-Gons: Action Recognition with a Discriminative Dictionary of Structured Elements with Varying Granularity
    Wang, Yuwang
    Wang, Baoyuan
    Yu, Yizhou
    Dai, Qionghai
    Tu, Zhuowen
    COMPUTER VISION - ACCV 2014, PT V, 2015, 9007 : 259 - 274
  • [34] Video Human Action Recognition by the Fusion of Significant Motion and Coarse-fine Temporal Granularity Features
    Xu, Kaishi
    Cao, Hanhua
    2024 5TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND APPLICATION, ICCEA 2024, 2024, : 1183 - 1186
  • [35] Study on video action recognition based on augment negative example multi-granularity discrimination model
    Liu, Liangzhen
    Yang, Yang
    Xia, Yingjie
    Kuang, Li
    Tongxin Xuebao/Journal on Communications, 45 (12): : 28 - 43
  • [36] An attention mechanism based convolutional LSTM network for video action recognition
    Ge, Hongwei
    Yan, Zehang
    Yu, Wenhao
    Sun, Liang
    MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (14) : 20533 - 20556
  • [37] Video spatiotemporal mapping for human action recognition by convolutional neural network
    Amin Zare
    Hamid Abrishami Moghaddam
    Arash Sharifi
    Pattern Analysis and Applications, 2020, 23 : 265 - 279
  • [38] CHANNEL-WISE TEMPORAL ATTENTION NETWORK FOR VIDEO ACTION RECOGNITION
    Lei, Jianjun
    Jia, Yalong
    Peng, Bo
    Huang, Qingming
    2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, : 562 - 567
  • [39] Diverse Features Fusion Network for video-based action recognition
    Deng, Haoyang
    Kong, Jun
    Jiang, Min
    Liu, Tianshan
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2021, 77
  • [40] MLENet: Multi-Level Extraction Network for video action recognition
    Wang, Fan
    Li, Xinke
    Xiong, Han
    Mo, Haofan
    Li, Yongming
    PATTERN RECOGNITION, 2024, 154