Multimodal Across Domains Gaze Target Detection

被引:2
|
作者
Tonini, Francesco [1 ]
Beyan, Cigdem [1 ]
Ricci, Elisa [1 ,2 ]
机构
[1] Univ Trento, Dept Informat Engn & Comp Sci, Trento, Italy
[2] Fdn Bruno Kessler, Deep Visual Learning Res Grp, Trento, Italy
关键词
Gaze target detection; gaze following; domain adaptation; RGB image; depth map; multimodal data;
D O I
10.1145/3536221.3556624
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper addresses the gaze target detection problem in single images captured from the third-person perspective. We present a multimodal deep architecture to infer where a person in a scene is looking. This spatial model is trained on the head images of the person-of-interest, scene and depth maps representing rich context information. Our model, unlike several prior art, do not require supervision of the gaze angles, do not rely on head orientation information and/or location of the eyes of person-of-interest. Extensive experiments demonstrate the stronger performance of our method on multiple benchmark datasets. We also investigated several variations of our method by altering joint-learning of multimodal data. Some variations outperform a few prior art as well. First time in this paper, we inspect domain adaptation for gaze target detection, and we empower our multimodal network to efectively handle the domain gap across datasets. The code of the proposed method is available at https://github.com/francescotonini/multimodal- across-domains-gaze-target-detection.
引用
收藏
页码:420 / 431
页数:12
相关论文
共 50 条
  • [31] The Multimodal Antidepressant Vortioxetine Restores Cognitive Function in Preclinical Models across Several Cognitive Domains
    Sanchez, Connie
    Pehrson, Alan L.
    Li, Yan
    Haddjeri, Nasser
    Gulinello, Maria
    Artigas, Francesc
    [J]. NEUROPSYCHOPHARMACOLOGY, 2014, 39 : S116 - S117
  • [32] Multimodal Human Attention Detection for Reading from Facial Expression, Eye Gaze, and Mouse Dynamics
    Li, Jiajia
    Ngai, Grace
    Leong, Hong Va
    Chan, Stephen C. F.
    [J]. APPLIED COMPUTING REVIEW, 2016, 16 (03): : 37 - 49
  • [33] PressTapFlick: Exploring a gaze and foot-based multimodal approach to gaze typing
    Rajanna, Vijay
    Russel, Murat
    Zhao, Jeffrey
    Hammond, Tracy
    [J]. INTERNATIONAL JOURNAL OF HUMAN-COMPUTER STUDIES, 2022, 161
  • [34] Target Detection Method in Multimodal Images with Complex Backgrounds and Different Views
    He Zhixiang
    Ding Xiaoqing
    [J]. INTELLIGENT ROBOTS AND COMPUTER VISION XXVIII: ALGORITHMS AND TECHNIQUES, 2011, 7878
  • [35] Multimodal Driver Interaction with Gesture, Gaze and Speech
    Aftab, Abdul Rafey
    [J]. ICMI'19: PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2019, : 487 - 492
  • [36] Target response controlled enzyme activity switch for multimodal biosensing detection
    Zhang, Lu
    Wu, Haiping
    Chen, Yirong
    Zhang, Songzhi
    Song, Mingxuan
    Liu, Changjin
    Li, Jia
    Cheng, Wei
    Ding, Shijia
    [J]. JOURNAL OF NANOBIOTECHNOLOGY, 2023, 21 (01)
  • [37] Target response controlled enzyme activity switch for multimodal biosensing detection
    Lu Zhang
    Haiping Wu
    Yirong Chen
    Songzhi Zhang
    Mingxuan Song
    Changjin Liu
    Jia Li
    Wei Cheng
    Shijia Ding
    [J]. Journal of Nanobiotechnology, 21
  • [38] Aircraft target detection using multimodal satellite-based data
    Yu, Lingling
    Yang, Qingxiang
    Dong, Limin
    [J]. SIGNAL PROCESSING, 2019, 155 : 358 - 367
  • [39] The importance of gaze and gesture in interactive multimodal explanation
    Lund, Kristine
    [J]. LANGUAGE RESOURCES AND EVALUATION, 2007, 41 (3-4) : 289 - 303
  • [40] A Multimodal Gaze-Controlled Virtual Keyboard
    Cecotti, Hubert
    [J]. IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS, 2016, 46 (04) : 601 - 606