Spatial attention based visual semantic learning for action recognition in still images

被引:12
|
作者
Zheng, Yunpeng [1 ,2 ]
Zheng, Xiangtao [1 ]
Lu, Xiaoqiang [1 ]
Wu, Siyuan [1 ]
机构
[1] Chinese Acad Sci, Xian Inst Opt & Precis Mech, Key Lab Spectral Imaging Technol CAS, Xian 710119, Shaanxi, Peoples R China
[2] Univ Chinese Acad Sci, Beijing 100049, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Still image-based action recognition; Spatial attention; Semantic parts; Deep learning; MODEL;
D O I
10.1016/j.neucom.2020.07.016
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual semantic parts play crucial roles in still image-based action recognition. A majority of existing methods require additional manual annotations such as human bounding boxes and predefined body parts besides action labels to learn action related visual semantic parts. However, labeling these manual annotations is rather time-consuming and labor-intensive. Moreover, not all manual annotations are effective when recognizing a specific action. Some of them can be irrelevant and even misguided. To address these limitations, this paper proposes a multi-stage deep learning method called Spatial Attention based Action Mask Networks (SAAM-Nets). The proposed method does not need any additional annotations besides action labels to obtain action-specific visual semantic parts. Instead, we propose a spatial attention layer injected in a convolutional neural network to create a specific action mask for each image with only action labels. Moreover, based on the action mask, we propose a region selection strategy to generate a semantic bounding box containing action-specific semantic parts. Furthermore, to effectively combine the information of the whole scene and the sematic box, two feature attention layers are adopted to obtain more discriminative representations. Experiments on four benchmark datasets have demonstrated that the proposed method can achieve promising performance compared with state-of-the-art methods. (C) 2020 Elsevier B.V. All rights reserved.
引用
收藏
页码:383 / 396
页数:14
相关论文
共 50 条
  • [31] Audio-Visual Event Localization by Learning Spatial and Semantic Co-Attention
    Xue, Cheng
    Zhong, Xionghu
    Cai, Minjie
    Chen, Hao
    Wang, Wenwu
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 418 - 429
  • [32] Symbolic control of visual attention: Semantic constraints on the spatial distribution of attention
    Bradley S. Gibson
    Matthias Scheutz
    Gregory J. Davis
    [J]. Attention, Perception, & Psychophysics, 2009, 71 : 363 - 374
  • [33] ESS: Learning Event-Based Semantic Segmentation from Still Images
    Sun, Zhaoning
    Messikommer, Nico
    Gehrig, Daniel
    Scaramuzza, Davide
    [J]. COMPUTER VISION, ECCV 2022, PT XXXIV, 2022, 13694 : 341 - 357
  • [34] Symbolic control of visual attention: Semantic constraints on the spatial distribution of attention
    Gibson, Bradley S.
    Scheutz, Matthias
    Davis, Gregory J.
    [J]. ATTENTION PERCEPTION & PSYCHOPHYSICS, 2009, 71 (02) : 363 - 374
  • [35] The role of spatial attention in visual object recognition
    Shyi, GCW
    Cheng, SK
    [J]. INTERNATIONAL JOURNAL OF PSYCHOLOGY, 1996, 31 (3-4) : 4841 - 4841
  • [36] Context Enhancement Methodology for Action Recognition in Still Images
    He, Jiarong
    Wu, Wei
    Li, Yuxing
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT I, 2023, 14254 : 112 - 122
  • [37] Temporal Hallucinating for Action Recognition with Few Still Images
    Wang, Yali
    Zhou, Lei
    Qiao, Yu
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 5314 - 5322
  • [38] Action Recognition in Still Images With Minimum Annotation Efforts
    Zhang, Yu
    Cheng, Li
    Wu, Jianxin
    Cai, Jianfei
    Do, Minh N.
    Lu, Jiangbo
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2016, 25 (11) : 5479 - 5490
  • [39] Spatial-Temporal Attention for Action Recognition
    Sun, Dengdi
    Wu, Hanqing
    Ding, Zhuanlian
    Luo, Bin
    Tang, Jin
    [J]. ADVANCES IN MULTIMEDIA INFORMATION PROCESSING, PT I, 2018, 11164 : 854 - 864
  • [40] Loss Guided Activation for Action Recognition in Still Images
    Liu, Lu
    Tan, Robby T.
    You, Shaodi
    [J]. COMPUTER VISION - ACCV 2018, PT V, 2019, 11365 : 152 - 167