Top-Down Deep Appearance Attention for Action Recognition

被引：0

作者：

Anwer, Rao Muhammad ^{[1
]}

Khan, Fahad Shahbaz ^{[2
]}

de Weijer, Joost van ^{[3
]}

Laaksonen, Jorma ^{[1
]}

机构：

[1] Aalto Univ, Sch Sci, Dept Comp Sci, Espoo, Finland

[2] Linkoping Univ, Comp Vis Lab, Linkoping, Sweden

[3] Univ Autonoma Barcelona, Comp Vis Ctr, CS Dept, Barcelona, Spain

来源：

IMAGE ANALYSIS, SCIA 2017, PT I | 2017年 / 10269卷

基金：

芬兰科学院;

关键词：

Action recognition; CNNs; Feature fusion; FEATURES;

D O I：

10.1007/978-3-319-59126-1_25

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recognizing human actions in videos is a challenging problem in computer vision. Recently, convolutional neural network based deep features have shown promising results for action recognition. In this paper, we investigate the problem of fusing deep appearance and motion cues for action recognition. We propose a video representation which combines deep appearance and motion based local convolutional features within the bag-of-deep-features framework. Firstly, dense deep appearance and motion based local convolutional features are extracted from spatial (RGB) and temporal (flow) networks, respectively. Both visual cues are processed in parallel by constructing separate visual vocabularies for appearance and motion. A category-specific appearance map is then learned to modulate the weights of the deep motion features. The proposed representation is discriminative and binds the deep local convolutional features to their spatial locations. Experiments are performed on two challenging datasets: JHMDB dataset with 21 action classes and ACT dataset with 43 categories. The results clearly demonstrate that our approach outperforms both standard approaches of early and late feature fusion. Further, our approach is only employing action labels and without exploiting body part information, but achieves competitive performance compared to the state-of-the-art deep features based approaches.

引用

页码：297 / 309

页数：13

共 50 条

[31] Top-down cortical interactions in visuospatial attention
Meehan, Timothy P.
Bressler, Steven L.
Tang, Wei
Astafiev, Serguei V.
Sylvester, Chad M.
Shulman, Gordon L.
Corbetta, Maurizio
BRAIN STRUCTURE & FUNCTION, 2017, 222 (07): : 3127 - 3145
[32] Top-down control of attention by stereoscopic depth
Zou, Bochao
Liu, Yue
Wolfe, Jeremy M.
VISION RESEARCH, 2022, 198
[33] Top-down cortical interactions in visuospatial attention
Timothy P. Meehan
Steven L. Bressler
Wei Tang
Serguei V. Astafiev
Chad M. Sylvester
Gordon L. Shulman
Maurizio Corbetta
Brain Structure and Function, 2017, 222 : 3127 - 3145
[34] Top-down attention guided object detection
Tian, Mei
Luc, Si-Wei
Liao, Ling-Zhi
Zhao, Lian-Wei
NEURAL INFORMATION PROCESSING, PT 1, PROCEEDINGS, 2006, 4232 : 193 - 202
[35] The limits of top-down control of visual attention
Van der Stigchel, Stefan
Belopolsky, Artem V.
Peters, Judith C.
Wijnen, Jasper G.
Meeter, Martijn
Theeuwes, Jan
ACTA PSYCHOLOGICA, 2009, 132 (03) : 201 - 212
[36] The relationship between top-down attention and consciousness
Tsuchiya, Naotsugu
NEUROSCIENCE RESEARCH, 2008, 61 : S32 - S32
[37] The effect of top-down attention on empathy fatigue
Shao, Min
Li, Lingxiao
Li, Xiong
Wei, Zilong
Wang, Junyao
Hong, Mingyu
Liu, Xiaocui
Meng, Jing
CEREBRAL CORTEX, 2024, 34 (01)
[38] Recognition by top-down and bottom-up processing in cortex: The control of selective attention
Graboi, D
Lisman, J
JOURNAL OF NEUROPHYSIOLOGY, 2003, 90 (02) : 798 - 810
[39] The contribution of top-down predictions to visual recognition
Bar, M
PERCEPTION, 2005, 34 : 47 - 48
[40] PHONEME RECOGNITION BASED ON TOP-DOWN APPROACH
AIKAWA, K
SUGIYAMA, M
SHIKANO, K
REVIEW OF THE ELECTRICAL COMMUNICATIONS LABORATORIES, 1984, 32 (02): : 200 - 211

← 1 2 3 4 5 →