Learning Attention-Enhanced Spatiotemporal Representation for Action Recognition

被引:11
|
作者
Shi, Zhensheng [1 ]
Cao, Liangjie [1 ]
Guan, Cheng [1 ]
Zheng, Haiyong [1 ]
Gu, Zhaorui [1 ]
Yu, Zhibin [1 ]
Zheng, Bing [1 ]
机构
[1] Ocean Univ China, Dept Elect Engn, Qingdao 266100, Peoples R China
来源
IEEE ACCESS | 2020年 / 8卷
基金
中国国家自然科学基金;
关键词
Action recognition; video understanding; spatiotemporal representation; visual attention; 3D-CNN; residual learning;
D O I
10.1109/ACCESS.2020.2968024
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Learning spatiotemporal features via 3D-CNN (3D Convolutional Neural Network) models has been regarded as an effective approach for action recognition. In this paper, we explore visual attention mechanism for video analysis and propose a novel 3D-CNN model, dubbed AE-I3D (Attention-Enhanced Inflated-3D Network), for learning attention-enhanced spatiotemporal representation. The contribution of our AE-I3D is threefold: First, we inflate soft attention in spatiotemporal scope for 3D videos, and adopt softmax to generate probability distribution of attentional features in a feedforward 3D-CNN architecture; Second, we devise an AE-Res (Attention-Enhanced Residual learning) module, which learns attention-enhanced features in a two-branch residual learning way, also the AE-Res module is lightweight and flexible, so that can be easily embedded into many 3D-CNN architectures; Finally, we embed multiple AE-Res modules into an I3D (Inflated-3D) network, yielding our AE-I3D model, which can be trained in an end-to-end, video-level manner. Different from previous attention networks, our method inflates residual attention from 2D image to 3D video for 3D attention residual learning to enhance spatiotemporal representation. We use RGB-only video data for evaluation on three benchmarks: UCF101, HMDB51, and Kinetics. The experimental results demonstrate that our AE-I3D is effective with competitive performance.
引用
收藏
页码:16785 / 16794
页数:10
相关论文
共 50 条
  • [1] Attention-enhanced gated recurrent unit for action recognition in tennis
    Gao, Meng
    Ju, Bingchun
    [J]. PEERJ COMPUTER SCIENCE, 2024, 10
  • [2] Learning Spatiotemporal Attention for Egocentric Action Recognition
    Lu, Minlong
    Liao, Danping
    Li, Ze-Nian
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 4425 - 4434
  • [3] ATTENTION-ENHANCED SENSORIMOTOR OBJECT RECOGNITION
    Thermos, Spyridon
    Papadopoulos, Georgios Th.
    Daras, Petros
    Potamianos, Gerasimos
    [J]. 2018 25TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2018, : 336 - 340
  • [4] Attention-Enhanced Disentangled Representation Learning for Unsupervised Domain Adaptation in Cardiac Segmentation
    Sun, Xiaoyi
    Liu, Zhizhe
    Zheng, Shuai
    Lin, Chen
    Zhu, Zhenfeng
    Zhao, Yao
    [J]. MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2022, PT VII, 2022, 13437 : 745 - 754
  • [5] Spatiotemporal Saliency Representation Learning for Video Action Recognition
    Kong, Yongqiang
    Wang, Yunhong
    Li, Annan
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 1515 - 1528
  • [6] Spatiotemporal attention enhanced features fusion network for action recognition
    Danfeng Zhuang
    Min Jiang
    Jun Kong
    Tianshan Liu
    [J]. International Journal of Machine Learning and Cybernetics, 2021, 12 : 823 - 841
  • [7] Spatiotemporal attention enhanced features fusion network for action recognition
    Zhuang, Danfeng
    Jiang, Min
    Kong, Jun
    Liu, Tianshan
    [J]. INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2021, 12 (03) : 823 - 841
  • [8] Attention-enhanced and trusted multimodal learning for micro-video venue recognition
    Wang, Bing
    Huang, Xianglin
    Cao, Gang
    Yang, Lifang
    Wei, Xiaolong
    Tao, Zhulin
    [J]. COMPUTERS & ELECTRICAL ENGINEERING, 2022, 102
  • [9] APLNet: Attention-enhanced progressive learning network
    Zhang, Hui
    Kang, Danqing
    He, Haibo
    Wang, Fei-Yue
    [J]. NEUROCOMPUTING, 2020, 371 : 166 - 176
  • [10] Action recognition method based on multi-stream attention-enhanced recursive graph convolution
    Wang, Huaijun
    Bai, Bingqian
    Li, Junhuai
    Ke, Hui
    Xiang, Wei
    [J]. APPLIED INTELLIGENCE, 2024, 54 (20) : 10133 - 10147