Residual attention unit for action recognition

被引:7
|
作者
Liao, Zhongke [1 ]
Hu, Haifeng [1 ]
Zhang, Junxuan [1 ]
Yin, Chang [1 ]
机构
[1] Sun Yat Sen Univ, Sch Elect & Informat Technol, Guangzhou, Guangdong, Peoples R China
基金
中国国家自然科学基金;
关键词
Action recognition; Residual learning; Attention; Background motion;
D O I
10.1016/j.cviu.2019.102821
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
3D CNNs are powerful tools for action recognition that can intuitively extract spatio-temporal features from raw videos. However, most of the existing 3D CNNs have not fully considered the disadvantageous effects of the background motion that frequently appears in videos. The background motion is usually misclassified as a part of human action, which may undermine modeling the dynamic pattern of the action. In this paper, we propose the residual attention unit (RAU) to address this problem. RAU aims to suppress the background motion by upweighting the values associated with the foreground region in the feature maps. Specifically, RAU contains two separate submodules in parallel, i.e., spatial attention as well as channel-wise attention. Given an intermediate feature map, the spatial attention works in a bottom-up top-down manner to generate the attention mask, while the channel-wise attention recalibrates the feature responses of all channels automatically. As applying the attention mechanism directly to the input features may lead to the loss of discriminative information, we design a bypass to preserve the integrity of the original features by a shortcut connection between the input and output of the attention module. Notably, our RAU can be embedded into 3D CNNs easily and enables end-to-end training along with the networks. The experimental results on UCF101 and HMDB51 demonstrate the validity of our RAU.
引用
收藏
页数:8
相关论文
共 50 条
  • [31] Upper Facial Action Unit Recognition
    Zor, Cemre
    Windeatt, Terry
    [J]. ADVANCES IN BIOMETRICS, 2009, 5558 : 239 - 248
  • [32] A Novel Attention Residual Network Expression Recognition Method
    Qi, Hui
    Zhang, Xipeng
    Shi, Ying
    Qi, Xiaobo
    [J]. IEEE ACCESS, 2024, 12 : 24609 - 24620
  • [33] Self Residual Attention Network For Deep Face Recognition
    Ling, Hefei
    Wu, Jiyang
    Wu, Lei
    Huang, Junrui
    Chen, Jiazhong
    Li, Ping
    [J]. IEEE ACCESS, 2019, 7 : 55159 - 55168
  • [34] Spatiotemporal Residual Networks for Video Action Recognition
    Feichtenhofer, Christoph
    Pinz, Axel
    Wildes, Richard P.
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
  • [35] Deep Attention Network for Egocentric Action Recognition
    Lu, Minlong
    Li, Ze-Nian
    Wang, Yueming
    Pan, Gang
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (08) : 3703 - 3713
  • [36] Adversarial Attention Networks for Early Action Recognition
    Zhang, Hong-Bo
    Pan, Wei-Xiang
    Du, Ji-Xiang
    Lei, Qing
    Chen, Yan
    Liu, Jing-Hua
    [J]. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024,
  • [37] Nesting spatiotemporal attention networks for action recognition
    Li, Jiapeng
    Wei, Ping
    Zheng, Nanning
    [J]. NEUROCOMPUTING, 2021, 459 : 338 - 348
  • [38] Spatial-Temporal Attention for Action Recognition
    Sun, Dengdi
    Wu, Hanqing
    Ding, Zhuanlian
    Luo, Bin
    Tang, Jin
    [J]. ADVANCES IN MULTIMEDIA INFORMATION PROCESSING, PT I, 2018, 11164 : 854 - 864
  • [39] Temporal Cross-Attention for Action Recognition
    Hashiguchi, Ryota
    Tamaki, Toru
    [J]. COMPUTER VISION - ACCV 2022 WORKSHOPS, 2023, 13848 : 283 - 294
  • [40] DYNAMIC TRACKING ATTENTION MODEL FOR ACTION RECOGNITION
    Wang, Chien-Yao
    Chiang, Chin-Chin
    Ding, Jian-Jiun
    Wang, Jia-Ching
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 1617 - 1621