Learning Spatiotemporal Attention for Egocentric Action Recognition

被引:14
|
作者
Lu, Minlong [1 ,2 ]
Liao, Danping [3 ]
Li, Ze-Nian [1 ]
机构
[1] Simon Fraser Univ, Sch Comp Sci, Burnaby, BC, Canada
[2] Huawei Technol, Shenzhen, Peoples R China
[3] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou, Peoples R China
关键词
D O I
10.1109/ICCVW.2019.00543
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recognizing camera wearers' actions from videos captured by the head-mounted camera is a challenging task. Previous methods often utilize attention models to characterize the relevant spatial regions to facilitate egocentric action recognition. Inspired by the recent advances of spatiotemporal feature learning using 3D convolutions, we propose a simple yet efficient module for learning spatiotemporal attention in egocentric videos with human gaze as supervision. Our model employs a two-stream architecture which consists of an appearance-based stream and motion-based stream. Each stream has the spatiotemporal attention module (STAM) to produce an attention map, which helps our model to focus on the relevant spatiotemporal regions of the video for action recognition. The experimental results demonstrate that our model is able to outperform the state-of-the-art methods by a large margin on the standard EGTEA Gaze+ dataset and produce attention maps that are consistent with human gaze.
引用
收藏
页码:4425 / 4434
页数:10
相关论文
共 50 条
  • [21] Learning Spatiotemporal-Selected Representations in Videos for Action Recognition
    Zhang, Jiachao
    Tong, Ying
    Jiao, Liangbao
    [J]. JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2023, 32 (12)
  • [22] Action Recognition Using Visual Attention with Reinforcement Learning
    Li, Hongyang
    Chen, Jun
    Hu, Ruimin
    Yu, Mei
    Chen, Huafeng
    Xu, Zengmin
    [J]. MULTIMEDIA MODELING, MMM 2019, PT II, 2019, 11296 : 365 - 376
  • [23] Correlation Net: Spatiotemporal multimodal deep learning for action recognition
    Yudistira, Novanto
    Kurita, Takio
    [J]. SIGNAL PROCESSING-IMAGE COMMUNICATION, 2020, 82
  • [24] Egocentric Action Recognition by Automatic Relation Modeling
    Li, Haoxin
    Zheng, Wei-Shi
    Zhang, Jianguo
    Hu, Haifeng
    Lu, Jiwen
    Lai, Jian-Huang
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (01) : 489 - 507
  • [25] Generic Action Recognition from Egocentric Videos
    Singh, Suriya
    Arora, Chetan
    Jawahar, C. V.
    [J]. 2015 FIFTH NATIONAL CONFERENCE ON COMPUTER VISION, PATTERN RECOGNITION, IMAGE PROCESSING AND GRAPHICS (NCVPRIPG), 2015,
  • [26] Cross Fusion for Egocentric Interactive Action Recognition
    Jiang, Haiyu
    Song, Yan
    He, Jiang
    Shu, Xiangbo
    [J]. MULTIMEDIA MODELING (MMM 2020), PT I, 2020, 11961 : 714 - 726
  • [27] Bringing Online Egocentric Action Recognition Into the Wild
    Goletto, Gabriele
    Planamente, Mirco
    Caputo, Barbara
    Averta, Giuseppe
    [J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (04) : 2333 - 2340
  • [28] Egocentric articulated pose tracking for action recognition
    Yonemoto, Haruka
    Murasaki, Kazuhiko
    Osawa, Tatsuya
    Sudo, Kyoko
    Shimamura, Jun
    Taniguchi, Yukinobu
    [J]. 2015 14TH IAPR INTERNATIONAL CONFERENCE ON MACHINE VISION APPLICATIONS (MVA), 2015, : 98 - 101
  • [29] An Attention-based Activity Recognition for Egocentric Video
    Matsuo, Kenji
    Yamada, Kentaro
    Ueno, Satoshi
    Naito, Sei
    [J]. 2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2014, : 565 - +
  • [30] Facial Action Unit Recognition Based on Self-Attention Spatiotemporal Fusion
    Liang, Chaolei
    Zou, Wei
    Hu, Danfeng
    Wang, JiaJun
    [J]. 2024 5TH INTERNATIONAL CONFERENCE ON COMPUTING, NETWORKS AND INTERNET OF THINGS, CNIOT 2024, 2024, : 600 - 605