A Modular Multimodal Architecture for Gaze Target Prediction: Application to Privacy-Sensitive Settings

被引:3
|
作者
Gupta, Anshul [1 ]
Tafasca, Samy
Odobez, Jean-Marc
机构
[1] Idiap Res Inst, Martigny, Switzerland
基金
瑞士国家科学基金会;
关键词
ATTENTION;
D O I
10.1109/CVPRW56347.2022.00552
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Predicting where a person is looking is a complex task, requiring to understand not only the person's gaze and scene content, but also the 3D scene structure and the person's situation (are they manipulating? interacting or observing others? attentive?) to detect obstructions in the line of sight or apply attention priors that humans typically have when observing others. In this paper, we hypothesize that identifying and leveraging such priors can be better achieved through the exploitation of explicitly derived multimodal cues such as depth and pose. We thus propose a modular multimodal architecture allowing to combine these cues using an attention mechanism. The architecture can naturally be exploited in privacy-sensitive situations such as surveillance and health, where personally identifiable information cannot be released. We perform extensive experiments on the GazeFollow and VideoAttentionTarget public datasets, obtaining state-of-the-art performance and demonstrating very competitive results in the privacy setting case. (1)
引用
收藏
页码:5037 / 5046
页数:10
相关论文
共 2 条
  • [1] Decentralized Identifier and Access Control Based Architecture for Privacy-Sensitive Data Distribution Service
    Oku, Reiya
    Shiomoto, Kohei
    Ohba, Yoshihiro
    [J]. 2022 IEEE 8TH WORLD FORUM ON INTERNET OF THINGS, WF-IOT, 2022,
  • [2] APPLICATION OF A NEURAL-NETWORK WITH A MODULAR ARCHITECTURE TO PROTEIN SECONDARY STRUCTURE PREDICTION
    SASAGAWA, F
    [J]. FUJITSU SCIENTIFIC & TECHNICAL JOURNAL, 1993, 29 (03): : 250 - 256