End-to-End Human-Gaze-Target Detection with Transformers

被引:14
|
作者
Tu, Danyang [1 ]
Min, Xiongkuo [1 ]
Duan, Huiyu [1 ]
Guo, Guodong [2 ]
Zhai, Guangtao [1 ]
Shen, Wei [3 ]
机构
[1] Shanghai Jiao Tong Univ, Inst Image Commun & Network Engn, Shanghai, Peoples R China
[2] Baidu Res, Inst Deep Learning, Beijing, Peoples R China
[3] Shanghai Jiao Tong Univ, AI Inst, MoE Key Lab Artificial Intelligence, Shanghai, Peoples R China
基金
上海市自然科学基金;
关键词
D O I
10.1109/CVPR52688.2022.00224
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose an effective and efficient method for Human-Gaze-Target (HGT) detection, i.e., gaze following. Current approaches decouple the HGT detection task into separate branches of salient object detection and human gaze prediction, employing a two-stage framework where human head locations must first be detected and then be fed into the next gaze target prediction sub-network In contrast, we redefine the HGT detection task as detecting human head locations and their gaze targets, simultaneously. By this way, our method, named Human-Gaze-Target detection TRansformer or HG7TR, streamlines the HGT detection pipeline by eliminating all other additional components. HG7TR reasons about the relations of salient objects and human gaze from the global image context. Moreover, unlike existing two-stage methods that require human head locations as input and can predict only one human's gaze target at a time, HG7772 can directly predict the locations of all people and their gaze targets at one time in an end-to-end manner. The effectiveness and robustness of our proposed method are verified with extensive experiments on the two standard benchmark datasets, GazeFollowing and VideoAttentionTarget. Without bells and whistles, HGTTR outperforms existing state-of-the-art methods by large margins (6.4 mAP gain on GazeFollowing and 10.3 mAP gain on VideoAttentionTarget) with a much simpler architecture.
引用
收藏
页码:2192 / 2200
页数:9
相关论文
共 50 条
  • [1] HOTR: End-to-End Human-Object Interaction Detection with Transformers
    Kim, Bumsoo
    Lee, Junhyun
    Kang, Jaewoo
    Kim, Eun-Sol
    Kim, Hyunwoo J.
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 74 - 83
  • [2] MGTR: End-to-End Mutual Gaze Detection with Transformer
    Guo, Hang
    Hu, Zhengxi
    Liu, Jingtai
    [J]. COMPUTER VISION - ACCV 2022, PT IV, 2023, 13844 : 363 - 378
  • [3] VPDETR: End-to-End Vanishing Point DEtection TRansformers
    Chen, Taiyan
    Ying, Xianghua
    Yang, Jinfa
    Wang, Ruibin
    Guo, Ruohao
    Xing, Bowei
    Shi, Ji
    [J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 2, 2024, : 1192 - 1200
  • [4] End-to-End Human Pose and Mesh Reconstruction with Transformers
    Lin, Kevin
    Wang, Lijuan
    Liu, Zicheng
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 1954 - 1963
  • [5] VRDFormer: End-to-End Video Visual Relation Detection with Transformers
    Zheng, Sipeng
    Chen, Shizhe
    Jin, Qin
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 18814 - 18824
  • [6] Deeply Tensor Compressed Transformers for End-to-End Object Detection
    Zhen, Peining
    Gao, Ziyang
    Hou, Tianshu
    Cheng, Yuan
    Chen, Hai-Bao
    [J]. THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 4716 - 4724
  • [7] End-to-end Symbolic Regression with Transformers
    Kamienny, Pierre-Alexandre
    d'Ascoli, Stephane
    Lample, Guillaume
    Charton, Francois
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [8] DTITR: End-to-end drug-target binding affinity prediction with transformers
    Monteiro, Nelson R. C.
    Oliveira, Jose L.
    Arrais, Joel P.
    [J]. COMPUTERS IN BIOLOGY AND MEDICINE, 2022, 147
  • [9] End-to-End Video Object Detection with Spatial-Temporal Transformers
    He, Lu
    Zhou, Qianyu
    Li, Xiangtai
    Niu, Li
    Cheng, Guangliang
    Li, Xiao
    Liu, Wenxuan
    Tong, Yunhai
    Ma, Lizhuang
    Zhang, Liqing
    [J]. PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 1507 - 1516
  • [10] CurT: End-to-End Text Line Detection in Historical Documents with Transformers
    Kiessling, Benjamin
    [J]. FRONTIERS IN HANDWRITING RECOGNITION, ICFHR 2022, 2022, 13639 : 34 - 48