Spatial attention based visual semantic learning for action recognition in still images

被引:12
|
作者
Zheng, Yunpeng [1 ,2 ]
Zheng, Xiangtao [1 ]
Lu, Xiaoqiang [1 ]
Wu, Siyuan [1 ]
机构
[1] Chinese Acad Sci, Xian Inst Opt & Precis Mech, Key Lab Spectral Imaging Technol CAS, Xian 710119, Shaanxi, Peoples R China
[2] Univ Chinese Acad Sci, Beijing 100049, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Still image-based action recognition; Spatial attention; Semantic parts; Deep learning; MODEL;
D O I
10.1016/j.neucom.2020.07.016
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual semantic parts play crucial roles in still image-based action recognition. A majority of existing methods require additional manual annotations such as human bounding boxes and predefined body parts besides action labels to learn action related visual semantic parts. However, labeling these manual annotations is rather time-consuming and labor-intensive. Moreover, not all manual annotations are effective when recognizing a specific action. Some of them can be irrelevant and even misguided. To address these limitations, this paper proposes a multi-stage deep learning method called Spatial Attention based Action Mask Networks (SAAM-Nets). The proposed method does not need any additional annotations besides action labels to obtain action-specific visual semantic parts. Instead, we propose a spatial attention layer injected in a convolutional neural network to create a specific action mask for each image with only action labels. Moreover, based on the action mask, we propose a region selection strategy to generate a semantic bounding box containing action-specific semantic parts. Furthermore, to effectively combine the information of the whole scene and the sematic box, two feature attention layers are adopted to obtain more discriminative representations. Experiments on four benchmark datasets have demonstrated that the proposed method can achieve promising performance compared with state-of-the-art methods. (C) 2020 Elsevier B.V. All rights reserved.
引用
收藏
页码:383 / 396
页数:14
相关论文
共 50 条
  • [1] Boxless Action Recognition in Still Images via Recurrent Visual Attention
    Feng, Weijiang
    Zhang, Xiang
    Huang, Xuhui
    Luo, Zhigang
    [J]. NEURAL INFORMATION PROCESSING (ICONIP 2017), PT II, 2017, 10635 : 663 - 673
  • [2] Attention Focused Spatial Pyramid Pooling for Boxless Action Recognition in Still Images
    Feng, Weijiang
    Zhang, Xiang
    Huang, Xuhui
    Luo, Zhigang
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, PT II, 2017, 10614 : 574 - 581
  • [3] Action Recognition with Visual Attention on Skeleton Images
    Yang, Zhengyuan
    Li, Yuncheng
    Yang, Jianchao
    Luo, Jiebo
    [J]. 2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2018, : 3309 - 3314
  • [4] Multibranch Attention Networks for Action Recognition in Still Images
    Yan, Shiyang
    Smith, Jeremy S.
    Lu, Wenjin
    Zhang, Bailing
    [J]. IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2018, 10 (04) : 1116 - 1125
  • [5] Action recognition in still images by learning spatial interest regions from videos
    Eweiwi, Abdalrahman
    Cheema, Muhammad Shahzad
    Bauckhage, Christian
    [J]. PATTERN RECOGNITION LETTERS, 2015, 51 : 8 - 15
  • [6] Learning Semantic-Aware Spatial-Temporal Attention for Interpretable Action Recognition
    Fu, Jie
    Gao, Junyu
    Xu, Changsheng
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (08) : 5213 - 5224
  • [7] Semantic Reinforced Attention Learning for Visual Place Recognition
    Peng, Guohao
    Yue, Yufeng
    Zhang, Jun
    Wu, Zhenyu
    Tang, Xiaoyu
    Wang, Danwei
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 13415 - 13422
  • [8] Action Recognition from Still Images Based on Deep VLAD Spatial Pyramids
    Yan, Shiyang
    Smith, Jeremy S.
    Zhang, Bailing
    [J]. SIGNAL PROCESSING-IMAGE COMMUNICATION, 2017, 54 : 118 - 129
  • [9] Learning Hierarchical Context for Action Recognition in Still Images
    Zhu, Haisheng
    Hu, Jian-Fang
    Zheng, Wei-Shi
    [J]. ADVANCES IN MULTIMEDIA INFORMATION PROCESSING, PT III, 2018, 11166 : 67 - 77
  • [10] LEARNING DISCRIMINATIVE ACTION AND CONTEXT REPRESENTATIONS FOR ACTION RECOGNITION IN STILL IMAGES
    Xin, Miao
    Zhang, Hong
    Yuan, Ding
    Sun, Mingui
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2017, : 757 - 762