Top-Down Deep Appearance Attention for Action Recognition

被引:0
|
作者
Anwer, Rao Muhammad [1 ]
Khan, Fahad Shahbaz [2 ]
de Weijer, Joost van [3 ]
Laaksonen, Jorma [1 ]
机构
[1] Aalto Univ, Sch Sci, Dept Comp Sci, Espoo, Finland
[2] Linkoping Univ, Comp Vis Lab, Linkoping, Sweden
[3] Univ Autonoma Barcelona, Comp Vis Ctr, CS Dept, Barcelona, Spain
来源
基金
芬兰科学院;
关键词
Action recognition; CNNs; Feature fusion; FEATURES;
D O I
10.1007/978-3-319-59126-1_25
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recognizing human actions in videos is a challenging problem in computer vision. Recently, convolutional neural network based deep features have shown promising results for action recognition. In this paper, we investigate the problem of fusing deep appearance and motion cues for action recognition. We propose a video representation which combines deep appearance and motion based local convolutional features within the bag-of-deep-features framework. Firstly, dense deep appearance and motion based local convolutional features are extracted from spatial (RGB) and temporal (flow) networks, respectively. Both visual cues are processed in parallel by constructing separate visual vocabularies for appearance and motion. A category-specific appearance map is then learned to modulate the weights of the deep motion features. The proposed representation is discriminative and binds the deep local convolutional features to their spatial locations. Experiments are performed on two challenging datasets: JHMDB dataset with 21 action classes and ACT dataset with 43 categories. The results clearly demonstrate that our approach outperforms both standard approaches of early and late feature fusion. Further, our approach is only employing action labels and without exploiting body part information, but achieves competitive performance compared to the state-of-the-art deep features based approaches.
引用
收藏
页码:297 / 309
页数:13
相关论文
共 50 条
  • [1] Top-down attention recurrent VLAD encoding for action recognition in videos
    Sudhakaran, Swathikiran
    Lanz, Oswald
    INTELLIGENZA ARTIFICIALE, 2019, 13 (01) : 107 - 118
  • [2] Top-Down Attention Recurrent VLAD Encoding for Action Recognition in Videos
    Sudhakaran, Swathikiran
    Lanz, Oswald
    AI*IA 2018 - ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, 11298 : 375 - 386
  • [3] Top-Down Color Attention for Object Recognition
    Khan, Fahad Shahbaz
    van de Weijer, Joost
    Vanrell, Maria
    2009 IEEE 12TH INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2009, : 979 - 986
  • [4] Combining ICA and top-down attention for robust speech recognition
    Bae, UM
    Lee, SY
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 13, 2001, 13 : 765 - 771
  • [5] Spatial-Temporal Bottom-Up Top-Down Attention Model for Action Recognition
    Wang, Jinpeng
    Ma, Andy J.
    IMAGE AND GRAPHICS, ICIG 2019, PT I, 2019, 11901 : 81 - 92
  • [6] Sequential recognition of superimposed patterns with top-down selective attention
    Kim, BT
    Lee, SY
    COMPUTATIONAL NEUROSCIENCE: TRENDS IN RESEARCH 2004, 2004, : 633 - 640
  • [7] Sequential recognition of superimposed patterns with top-down selective attention
    Kim, BT
    Lee, SY
    NEUROCOMPUTING, 2004, 58 : 633 - 640
  • [8] Top-Down Attention Guidance Shapes Action Encoding in the pSTS
    Stehr, Daniel A.
    Zhou, Xiaojue
    Tisby, Mariel
    Hwu, Patrick T.
    Pyles, John A.
    Grossman, Emily D.
    CEREBRAL CORTEX, 2021, 31 (07) : 3522 - 3535
  • [9] Mechanisms of top-down attention
    Baluchi, Farhan
    Itti, Laurent
    TRENDS IN NEUROSCIENCES, 2011, 34 (04) : 210 - 224
  • [10] Semantic parts based top-down pyramid for action recognition
    Zhao, Zhichen
    Ma, Huimin
    Chen, Xiaozhi
    PATTERN RECOGNITION LETTERS, 2016, 84 : 134 - 141