AGPN: Action Granularity Pyramid Network for Video Action Recognition

被引:21
|
作者
Chen, Yatong [1 ]
Ge, Hongwei [1 ]
Liu, Yuxuan [1 ]
Cai, Xinye [1 ]
Sun, Liang [1 ]
机构
[1] Dalian Univ Technol, Sch Comp Sci & Technol, Dalian 116024, Peoples R China
基金
中国国家自然科学基金;
关键词
Video action recognition; pyramid network; multi-scale; multi-granularity; REPRESENTATIONS;
D O I
10.1109/TCSVT.2023.3235522
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Video action recognition is a fundamental task for video understanding. Action recognition in complex spatio-temporal contexts generally requires fusing of different multi-granularity action information. However, existing works do not consider spatio-temporal information modeling and fusion from the perspective of action granularity. To address this problem, this paper proposes an Action Granularity Pyramid Network (AGPN) for action recognition, which can be flexibly integrated into 2D backbone networks. The core module is the Action Granularity Pyramid Module (AGPM), a hierarchical pyramid structure with residual connections, which is established to fuse multi-granularity action spatio-temporal information. From top to bottom level in the designed pyramid structure, the receptive field decreases and action granularity becomes more refined. To enrich temporal information of the inputs, a Multiple Frame Rate Module (MFM) is proposed to mix different frame rates at a fine-grained pixel-wise level. Moreover, a Spatio-temporal Anchor Module (SAM) is employed to fix spatio-temporal feature anchors to promote the effectiveness of feature extraction. We conduct extensive experiments on three large-scale action recognition datasets, Something-Something V1 & V2 and Kinetics-400. The results demonstrate that our proposed AGPN outperforms the state-of-the-art methods for the tasks of video action recognition.
引用
收藏
页码:3912 / 3923
页数:12
相关论文
共 50 条
  • [1] Spatiotemporal Pyramid Network for Video Action Recognition
    Wang, Yunbo
    Long, Mingsheng
    Wang, Jianmin
    Yu, Philip S.
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 2097 - 2106
  • [2] Temporal Pyramid Network for Action Recognition
    Yang, Ceyuan
    Xu, Yinghao
    Shi, Jianping
    Dai, Bo
    Zhou, Bolei
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 588 - 597
  • [3] Knowledge granularity spectrum, action pyramid, and the scaling problem
    Ye, YM
    Tsotsos, JK
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2001, 15 (03) : 379 - 404
  • [4] Action Keypoint Network for Efficient Video Recognition
    Chen, Xu
    Han, Yahong
    Wang, Xiaohan
    Sun, Yifan
    Yang, Yi
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 4980 - 4993
  • [5] Binary Neural Network for Video Action Recognition
    Han, Hongfeng
    Lu, Zhiwu
    Wen, Ji-Rong
    MULTIMEDIA MODELING, MMM 2023, PT I, 2023, 13833 : 95 - 106
  • [6] Dense Dilated Network for Video Action Recognition
    Xu, Baohan
    Ye, Hao
    Zheng, Yingbin
    Wang, Heng
    Luwang, Tianyu
    Jiang, Yu-Gang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (10) : 4941 - 4953
  • [7] Temporal Pyramid Pooling Based Relation Network for Action Recognition
    Zheng, Zhenxing
    An, Gaoyun
    Ruan, Qiuqi
    PROCEEDINGS OF 2018 14TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2018, : 644 - 647
  • [8] Video Action Recognition Based on Spatio-temporal Feature Pyramid Module
    Gong, Suming
    Chen, Ying
    2020 13TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID 2020), 2020, : 338 - 341
  • [9] Multi-scale spatialtemporal information deep fusion network with temporal pyramid mechanism for video action recognition
    Ou, Hongshi
    Sun, Jifeng
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2021, 41 (03) : 4533 - 4545
  • [10] Residual attention fusion network for video action recognition
    Li, Ao
    Yi, Yang
    Liang, Daan
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2024, 98