Learning Spatiotemporal Attention for Egocentric Action Recognition

被引:14
|
作者
Lu, Minlong [1 ,2 ]
Liao, Danping [3 ]
Li, Ze-Nian [1 ]
机构
[1] Simon Fraser Univ, Sch Comp Sci, Burnaby, BC, Canada
[2] Huawei Technol, Shenzhen, Peoples R China
[3] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou, Peoples R China
关键词
D O I
10.1109/ICCVW.2019.00543
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recognizing camera wearers' actions from videos captured by the head-mounted camera is a challenging task. Previous methods often utilize attention models to characterize the relevant spatial regions to facilitate egocentric action recognition. Inspired by the recent advances of spatiotemporal feature learning using 3D convolutions, we propose a simple yet efficient module for learning spatiotemporal attention in egocentric videos with human gaze as supervision. Our model employs a two-stream architecture which consists of an appearance-based stream and motion-based stream. Each stream has the spatiotemporal attention module (STAM) to produce an attention map, which helps our model to focus on the relevant spatiotemporal regions of the video for action recognition. The experimental results demonstrate that our model is able to outperform the state-of-the-art methods by a large margin on the standard EGTEA Gaze+ dataset and produce attention maps that are consistent with human gaze.
引用
收藏
页码:4425 / 4434
页数:10
相关论文
共 50 条
  • [1] Deep Attention Network for Egocentric Action Recognition
    Lu, Minlong
    Li, Ze-Nian
    Wang, Yueming
    Pan, Gang
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (08) : 3703 - 3713
  • [2] Learning Attention-Enhanced Spatiotemporal Representation for Action Recognition
    Shi, Zhensheng
    Cao, Liangjie
    Guan, Cheng
    Zheng, Haiyong
    Gu, Zhaorui
    Yu, Zhibin
    Zheng, Bing
    [J]. IEEE ACCESS, 2020, 8 : 16785 - 16794
  • [3] Symbiotic Attention with Privileged Information for Egocentric Action Recognition
    Wang, Xiaohan
    Wu, Yu
    Zhu, Linchao
    Yang, Yi
    [J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 12249 - 12256
  • [4] Interactive Prototype Learning for Egocentric Action Recognition
    Wang, Xiaohan
    Zhu, Linchao
    Wang, Heng
    Yang, Yi
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 8148 - 8157
  • [5] Multitask Learning to Improve Egocentric Action Recognition
    Kapidis, Georgios
    Poppe, Ronald
    van Dam, Elsbeth
    Noldus, Lucas
    Veltkamp, Remco
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 4396 - 4405
  • [6] Nesting spatiotemporal attention networks for action recognition
    Li, Jiapeng
    Wei, Ping
    Zheng, Nanning
    [J]. NEUROCOMPUTING, 2021, 459 : 338 - 348
  • [7] Symbiotic Attention for Egocentric Action Recognition With Object-Centric Alignment
    Wang, Xiaohan
    Zhu, Linchao
    Wu, Yu
    Yang, Yi
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (06) : 6605 - 6617
  • [8] LSTA: Long Short-Term Attention for Egocentric Action Recognition
    Sudhakaran, Swathikiran
    Escalera, Sergio
    Lanz, Oswald
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 9946 - 9955
  • [9] Recurrent Spatiotemporal Feature Learning for Action Recognition
    Chen, Ze
    Lu, Hongtao
    [J]. ICRAI 2018: PROCEEDINGS OF 2018 4TH INTERNATIONAL CONFERENCE ON ROBOTICS AND ARTIFICIAL INTELLIGENCE -, 2018, : 12 - 17
  • [10] NON-LOCAL SPATIOTEMPORAL CORRELATION ATTENTION FOR ACTION RECOGNITION
    Ha, Manh-Hung
    Chen, Oscal Tzyh-Chiang
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS (IEEE ICMEW 2022), 2022,