Orientation Cues-Aware Facial Relationship Representation for Head Pose Estimation via Transformer

被引:74
|
作者
Liu, Hai [1 ]
Zhang, Cheng [1 ]
Deng, Yongjian [2 ]
Liu, Tingting [3 ,4 ]
Zhang, Zhaoli [1 ]
Li, You-Fu [4 ]
机构
[1] Cent China Normal Univ, Natl Engn Res Ctr Elearning, Wuhan 430079, Peoples R China
[2] Beijing Univ Technol, Coll Comp Sci, Beijing 100124, Peoples R China
[3] Hubei Univ, Sch Educ, Wuhan 430062, Hubei, Peoples R China
[4] City Univ Hong Kong, Dept Mech Engn, Hong Kong, Peoples R China
关键词
Head; Transformers; Visualization; Computer architecture; Pose estimation; Task analysis; Semantics; Head pose estimation; attention mechanism; relationship perception; deep learning; transformer;
D O I
10.1109/TIP.2023.3331309
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Head pose estimation (HPE) is an indispensable upstream task in the fields of human-machine interaction, self-driving, and attention detection. However, practical head pose applications suffer from several challenges, such as severe occlusion, low illumination, and extreme orientations. To address these challenges, we identify three cues from head images, namely, critical minority relationships, neighborhood orientation relationships, and significant facial changes. On the basis of the three cues, two key insights on head poses are revealed: 1) intra-orientation relationship and 2) cross-orientation relationship. To leverage two key insights above, a novel relationship-driven method is proposed based on the Transformer architecture, in which facial and orientation relationships can be learned. Specifically, we design several orientation tokens to explicitly encode basic orientation regions. Besides, a novel token guide multi-loss function is accordingly designed to guide the orientation tokens as they learn the desired regional similarities and relationships. Experimental results on three challenging benchmark HPE datasets show that our proposed TokenHPE achieves state-of-the-art performance. Moreover, qualitative visualizations are provided to verify the effectiveness of the token-learning methodology.
引用
收藏
页码:6289 / 6302
页数:14
相关论文
共 50 条
  • [1] LDCNet: Limb Direction Cues-aware Network for Flexible Human Pose Estimation in Industrial Behavioral Biometrics Systems
    Liu, Tingting
    Liu, Hai
    Yang, Bing
    Zhang, Zhaoli
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2024, 20 (06) : 8068 - 8078
  • [2] Facial Pose Estimation via Dense and Sparse Representation
    Yu, Hui
    Liu, Honghai
    2014 IEEE SYMPOSIUM ON ROBOTIC INTELLIGENCE IN INFORMATIONALLY STRUCTURED SPACE (RIISS), 2014, : 98 - 103
  • [3] GCANet: Geometry cues-aware facial expression recognition based on graph convolutional networks
    Wang, Shutong
    Zhao, Anran
    Lai, Chenghang
    Zhang, Qi
    Li, Duantengchuan
    Gao, Yihua
    Dong, Liangshan
    Wang, Xiaoguang
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2023, 35 (07)
  • [4] MPCTrans: Multi-Perspective Cue-Aware Joint Relationship Representation for 3D Hand Pose Estimation via Swin Transformer
    Wan, Xiangan
    Ju, Jianping
    Tang, Jianying
    Lin, Mingyu
    Rao, Ning
    Chen, Deng
    Liu, Tingting
    Li, Jing
    Bian, Fan
    Xiong, Nicholas
    SENSORS, 2024, 24 (21)
  • [5] Non-Stationary Representation for Continuity Aware Head Pose Estimation Via Deep Neural Decision Trees
    Wang, Jiang
    Ullah, Farhan
    Cai, Ying
    Li, Jing
    IEEE ACCESS, 2019, 7 : 181947 - 181958
  • [6] TokenHPE: Learning Orientation Tokens for Efficient Head Pose Estimation via Transformers
    Zhang, Cheng
    Liu, Hai
    Deng, Yongjian
    Xie, Bochen
    Li, Youfu
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 8897 - 8906
  • [7] Head Pose Estimation using Sparse Representation
    Ma, Bingpeng
    Wang, Tianjiang
    2010 SECOND INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND APPLICATIONS: ICCEA 2010, PROCEEDINGS, VOL 2, 2010, : 389 - 392
  • [8] A NEW REPRESENTATION METHOD OF HEAD IMAGES FOR HEAD POSE ESTIMATION
    Liu, Xiangyang
    Lu, Hongtao
    Luo, Heng
    2009 16TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1-6, 2009, : 3585 - 3588
  • [9] Pose Uncertainty Aware Movement Synchrony Estimation via Spatial-Temporal Graph Transformer
    Li, Jicheng
    Bhat, Anjana
    Barmaki, Roghayeh
    PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, ICMI 2022, 2022, : 73 - 82
  • [10] Facial tracking with head pose estimation in stereo vision
    Huang, Y
    Huang, T
    2002 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL III, PROCEEDINGS, 2002, : 833 - 836