Attention Flow: End-to-End Joint Attention Estimation

被引:0
|
作者
Sumer, Omer [1 ]
Gerjets, Peter [2 ]
Trautwein, Ulrich [1 ]
Kasneci, Enkelejda [1 ]
机构
[1] Univ Tubingen, Tubingen, Germany
[2] Leibniz Inst Wissensmedien, Tubingen, Germany
关键词
CHILDREN; MODEL;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper addresses the problem of understanding joint attention in third-person social scene videos. Joint attention is the shared gaze behaviour of two or more individuals on an object or an area of interest and has a wide range of applications such as human-computer interaction, educational assessment, treatment of patients with attention disorders, and many more. Our method, Attention Flow, learns joint attention in an end-to-end fashion by using saliency-augmented attention maps and two novel convolutional attention mechanisms that determine to select relevant features and improve joint attention localization. We compare the effect of saliency maps and attention mechanisms and report quantitative and qualitative results on the detection and localization of joint attention in the VideoCoAtt dataset, which contains complex social scenes.
引用
收藏
页码:3316 / 3325
页数:10
相关论文
共 50 条
  • [31] Improving End-to-End SLU performance with Prosodic Attention and Distillation
    Rajaa, Shangeth
    INTERSPEECH 2023, 2023, : 1114 - 1118
  • [32] SWINBERT: End-to-End Transformers with Sparse Attention for Video Captioning
    Lin, Kevin
    Li, Linjie
    Lin, Chung-Ching
    Ahmed, Faisal
    Gan, Zhe
    Liu, Zicheng
    Lu, Yumao
    Wang, Lijuan
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 17928 - 17937
  • [33] Self-Attention Transducers for End-to-End Speech Recognition
    Tian, Zhengkun
    Yi, Jiangyan
    Tao, Jianhua
    Bai, Ye
    Wen, Zhengqi
    INTERSPEECH 2019, 2019, : 4395 - 4399
  • [34] End-to-End Chinese Image Text Recognition with Attention Model
    Sheng, Fenfen
    Zhai, Chuanlei
    Chen, Zhineng
    Xu, Bo
    NEURAL INFORMATION PROCESSING (ICONIP 2017), PT III, 2017, 10636 : 180 - 189
  • [35] STRUCTURED SPARSE ATTENTION FOR END-TO-END AUTOMATIC SPEECH RECOGNITION
    Xue, Jiabin
    Zheng, Tieran
    Han, Jiqing
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7044 - 7048
  • [36] END-TO-END NEURAL SPEAKER DIARIZATION WITH SELF-ATTENTION
    Fujita, Yusuke
    Kanda, Naoyuki
    Horiguchi, Shota
    Xue, Yawen
    Nagamatsu, Kenji
    Watanabe, Shinji
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 296 - 303
  • [37] A Novel End-to-End Image Caption Based on Multimodal Attention
    Li X.-M.
    Yue G.
    Chen G.-W.
    Dianzi Keji Daxue Xuebao/Journal of the University of Electronic Science and Technology of China, 2020, 49 (06): : 867 - 874
  • [38] Improved training of end-to-end attention models for speech recognition
    Zeyer, Albert
    Irie, Kazuki
    Schlueter, Ralf
    Ney, Hermann
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 7 - 11
  • [39] End-to-End Point Cloud Completion Network with Attention Mechanism
    Li, Yaqin
    Han, Binbin
    Zeng, Shan
    Xu, Shengyong
    Yuan, Cao
    SENSORS, 2022, 22 (17)
  • [40] An End-to-End Lane Detection Model with Attention and Residual Block
    Wang, Bo
    Yan, Xiaoting
    Li, Deguang
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022