Integrating Human Gaze into Attention for Egocentric Activity Recognition

被引:17
|
作者
Min, Kyle [1 ]
Corso, Jason J. [1 ]
机构
[1] Univ Michigan, Ann Arbor, MI 48109 USA
关键词
D O I
10.1109/WACV48630.2021.00111
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
It is well known that human gaze carries significant information about visual attention. However, there are three main difficulties in incorporating the gaze data in an attention mechanism of deep neural networks: (i) the gaze fixation points are likely to have measurement errors due to blinking and rapid eye movements; (ii) it is unclear when and how much the gaze data is correlated with visual attention; and (iii) gaze data is not always available in many real-world situations. In this work, we introduce an effective probabilistic approach to integrate human gaze into spatiotemporal attention for egocentric activity recognition. Specifically, we represent the locations of gaze fixation points as structured discrete latent variables to model their uncertainties. In addition, we model the distribution of gaze fixations using a variational method. The gaze distribution is learned during the training process so that the ground-truth annotations of gaze locations are no longer needed in testing situations since they are predicted from the learned gaze distribution. The predicted gaze locations are used to provide informative attentional cues to improve the recognition performance. Our method outperforms all the previous state-of-the-art approaches on EGTEA, which is a large-scale dataset for egocentric activity recognition provided with gaze measurements. We also perform an ablation study and qualitative analysis to demonstrate that our attention mechanism is effective.
引用
收藏
页码:1068 / 1077
页数:10
相关论文
共 50 条
  • [31] Deep appearance and motion learning for egocentric activity recognition
    Wang, Xuanhan
    Gao, Lianli
    Song, Jingkuan
    Zhen, Xiantong
    Sebe, Nicu
    Shen, Heng Tao
    [J]. NEUROCOMPUTING, 2018, 275 : 438 - 447
  • [32] Analysis of SVM and kNN Classifiers For Egocentric Activity Recognition
    Kumar, K. P. Sanal
    Bhavani, R.
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INFORMATICS AND ANALYTICS (ICIA' 16), 2016,
  • [33] Hand-hygiene activity recognition in egocentric video
    Zhong, Chengzhang
    Reibman, Amy R.
    Cordoba, Hansel Mina
    Deering, Amanda J.
    [J]. 2019 IEEE 21ST INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP 2019), 2019,
  • [34] Activity recognition using an egocentric perspective of everyday objects
    Surie, Dipak
    Pederson, Thomas
    Lagriffoul, Fabien
    Janlert, Lars-Erik
    Sjolie, Daniel
    [J]. UBIQUITOUS INTELLIGENCE AND COMPUTING, PROCEEDINGS, 2007, 4611 : 246 - +
  • [35] Learning to Predict Gaze in Egocentric Video
    Li, Yin
    Fathi, Alireza
    Rehg, James M.
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, : 3216 - 3223
  • [36] Knowledge-driven Egocentric Multimodal Activity Recognition
    Huang, Yi
    Yang, Xiaoshan
    Gao, Junyu
    Sang, Jitao
    Xu, Changsheng
    [J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2021, 16 (04)
  • [37] Egocentric Daily Activity Recognition via Multitask Clustering
    Yan, Yan
    Ricci, Elisa
    Liu, Gaowen
    Sebe, Nicu
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015, 24 (10) : 2984 - 2995
  • [38] Functional Gaze Prediction in Egocentric Video
    Naas, Si-Ahmed
    Jiang, Xiaolan
    Sigg, Stephan
    Ji, Yusheng
    [J]. MOMM 2020: THE 18TH INTERNATIONAL CONFERENCE ON ADVANCES IN MOBILE COMPUTING & MULTIMEDIA, 2020, : 40 - 47
  • [39] Gaze behavior and the perception of egocentric distance
    Gajewski, Daniel A.
    Wallin, Courtney P.
    Philbeck, John W.
    [J]. JOURNAL OF VISION, 2014, 14 (01):
  • [40] Gaze direction and the extraction of egocentric distance
    Daniel A. Gajewski
    Courtney P. Wallin
    John W. Philbeck
    [J]. Attention, Perception, & Psychophysics, 2014, 76 : 1739 - 1751