Learning Spatiotemporal Attention for Egocentric Action Recognition

被引：14

作者：

Lu, Minlong ^{[1
,2
]}

Liao, Danping ^{[3
]}

Li, Ze-Nian ^{[1
]}

机构：

[1] Simon Fraser Univ, Sch Comp Sci, Burnaby, BC, Canada

[2] Huawei Technol, Shenzhen, Peoples R China

[3] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou, Peoples R China

来源：

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW) | 2019年

关键词：

D O I：

10.1109/ICCVW.2019.00543

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recognizing camera wearers' actions from videos captured by the head-mounted camera is a challenging task. Previous methods often utilize attention models to characterize the relevant spatial regions to facilitate egocentric action recognition. Inspired by the recent advances of spatiotemporal feature learning using 3D convolutions, we propose a simple yet efficient module for learning spatiotemporal attention in egocentric videos with human gaze as supervision. Our model employs a two-stream architecture which consists of an appearance-based stream and motion-based stream. Each stream has the spatiotemporal attention module (STAM) to produce an attention map, which helps our model to focus on the relevant spatiotemporal regions of the video for action recognition. The experimental results demonstrate that our model is able to outperform the state-of-the-art methods by a large margin on the standard EGTEA Gaze+ dataset and produce attention maps that are consistent with human gaze.

引用

页码：4425 / 4434

页数：10

共 50 条

[21] Learning Spatiotemporal-Selected Representations in Videos for Action Recognition
Zhang, Jiachao
Tong, Ying
Jiao, Liangbao
[J]. JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2023, 32 (12)
[22] Action Recognition Using Visual Attention with Reinforcement Learning
Li, Hongyang
Chen, Jun
Hu, Ruimin
Yu, Mei
Chen, Huafeng
Xu, Zengmin
[J]. MULTIMEDIA MODELING, MMM 2019, PT II, 2019, 11296 : 365 - 376
[23] Correlation Net: Spatiotemporal multimodal deep learning for action recognition
Yudistira, Novanto
Kurita, Takio
[J]. SIGNAL PROCESSING-IMAGE COMMUNICATION, 2020, 82
[24] Egocentric Action Recognition by Automatic Relation Modeling
Li, Haoxin
Zheng, Wei-Shi
Zhang, Jianguo
Hu, Haifeng
Lu, Jiwen
Lai, Jian-Huang
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (01) : 489 - 507
[25] Generic Action Recognition from Egocentric Videos
Singh, Suriya
Arora, Chetan
Jawahar, C. V.
[J]. 2015 FIFTH NATIONAL CONFERENCE ON COMPUTER VISION, PATTERN RECOGNITION, IMAGE PROCESSING AND GRAPHICS (NCVPRIPG), 2015,
[26] Cross Fusion for Egocentric Interactive Action Recognition
Jiang, Haiyu
Song, Yan
He, Jiang
Shu, Xiangbo
[J]. MULTIMEDIA MODELING (MMM 2020), PT I, 2020, 11961 : 714 - 726
[27] Bringing Online Egocentric Action Recognition Into the Wild
Goletto, Gabriele
Planamente, Mirco
Caputo, Barbara
Averta, Giuseppe
[J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (04) : 2333 - 2340
[28] Egocentric articulated pose tracking for action recognition
Yonemoto, Haruka
Murasaki, Kazuhiko
Osawa, Tatsuya
Sudo, Kyoko
Shimamura, Jun
Taniguchi, Yukinobu
[J]. 2015 14TH IAPR INTERNATIONAL CONFERENCE ON MACHINE VISION APPLICATIONS (MVA), 2015, : 98 - 101
[29] An Attention-based Activity Recognition for Egocentric Video
Matsuo, Kenji
Yamada, Kentaro
Ueno, Satoshi
Naito, Sei
[J]. 2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2014, : 565 - +
[30] Facial Action Unit Recognition Based on Self-Attention Spatiotemporal Fusion
Liang, Chaolei
Zou, Wei
Hu, Danfeng
Wang, JiaJun
[J]. 2024 5TH INTERNATIONAL CONFERENCE ON COMPUTING, NETWORKS AND INTERNET OF THINGS, CNIOT 2024, 2024, : 600 - 605

← 1 2 3 4 5 →