Orientation Cues-Aware Facial Relationship Representation for Head Pose Estimation via Transformer

被引:74
|
作者
Liu, Hai [1 ]
Zhang, Cheng [1 ]
Deng, Yongjian [2 ]
Liu, Tingting [3 ,4 ]
Zhang, Zhaoli [1 ]
Li, You-Fu [4 ]
机构
[1] Cent China Normal Univ, Natl Engn Res Ctr Elearning, Wuhan 430079, Peoples R China
[2] Beijing Univ Technol, Coll Comp Sci, Beijing 100124, Peoples R China
[3] Hubei Univ, Sch Educ, Wuhan 430062, Hubei, Peoples R China
[4] City Univ Hong Kong, Dept Mech Engn, Hong Kong, Peoples R China
关键词
Head; Transformers; Visualization; Computer architecture; Pose estimation; Task analysis; Semantics; Head pose estimation; attention mechanism; relationship perception; deep learning; transformer;
D O I
10.1109/TIP.2023.3331309
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Head pose estimation (HPE) is an indispensable upstream task in the fields of human-machine interaction, self-driving, and attention detection. However, practical head pose applications suffer from several challenges, such as severe occlusion, low illumination, and extreme orientations. To address these challenges, we identify three cues from head images, namely, critical minority relationships, neighborhood orientation relationships, and significant facial changes. On the basis of the three cues, two key insights on head poses are revealed: 1) intra-orientation relationship and 2) cross-orientation relationship. To leverage two key insights above, a novel relationship-driven method is proposed based on the Transformer architecture, in which facial and orientation relationships can be learned. Specifically, we design several orientation tokens to explicitly encode basic orientation regions. Besides, a novel token guide multi-loss function is accordingly designed to guide the orientation tokens as they learn the desired regional similarities and relationships. Experimental results on three challenging benchmark HPE datasets show that our proposed TokenHPE achieves state-of-the-art performance. Moreover, qualitative visualizations are provided to verify the effectiveness of the token-learning methodology.
引用
收藏
页码:6289 / 6302
页数:14
相关论文
共 50 条
  • [41] Head Pose Estimation based on Fuzzy Systems using Facial Geometric Features
    Sadeghzadeh, Arezoo
    Ebrahimnezhad, Hossein
    2016 8TH INTERNATIONAL SYMPOSIUM ON TELECOMMUNICATIONS (IST), 2016, : 777 - 782
  • [42] Topographic feature mapping for head pose estimation with application to facial gesture interfaces
    Raytchev, B
    Yoda, I
    Sakaue, K
    COMPUTER VISION IN HUMAN-COMPUTER INTERACTION, PROCEEDINGS, 2005, 3766 : 180 - 188
  • [43] Robust head pose estimation via supervised manifold learning
    Wang, Chao
    Song, Xubo
    NEURAL NETWORKS, 2014, 53 : 15 - 25
  • [44] Towards unsupervised learning of joint facial landmark detection and head pose estimation
    Zou, Zhiming
    Jia, Dian
    Tang, Wei
    PATTERN RECOGNITION, 2025, 162
  • [45] COUPLED CASCADE REGRESSION FOR SIMULTANEOUS FACIAL LANDMARK DETECTION AND HEAD POSE ESTIMATION
    Gou, Chao
    Wu, Yue
    Wang, Fei-Yue
    Ji, Qiang
    2017 24TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2017, : 2906 - 2910
  • [46] Latent Fingerprint Orientation Estimation via Sparse Representation
    Liu, Manhua
    Liu, Shuxin
    2015 10TH INTERNATIONAL CONFERENCE ON INFORMATION, COMMUNICATIONS AND SIGNAL PROCESSING (ICICS), 2015,
  • [47] Real-time masked face classification and head pose estimation for RGB facial image via knowledge distillation
    Chien Thai
    Viet Tran
    Minh Bui
    Dat Nguyen
    Huong Ninh
    Hai Tran
    INFORMATION SCIENCES, 2022, 616 : 330 - 347
  • [48] Unsupervised Geometry-Aware Representation for 3D Human Pose Estimation
    Rhodin, Helge
    Salzmann, Mathieu
    Fua, Pascal
    COMPUTER VISION - ECCV 2018, PT X, 2018, 11214 : 765 - 782
  • [49] HeadPosr: End-to-end Trainable Head Pose Estimation using Transformer Encoders
    Dhingra, Naina
    2021 16TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2021), 2021,
  • [50] Generic 3D Representation via Pose Estimation and Matching
    Zamir, Amir R.
    Wekel, Tilman
    Agrawal, Pulkit
    Wei, Colin
    Malik, Jitendra
    Savarese, Silvio
    COMPUTER VISION - ECCV 2016, PT III, 2016, 9907 : 535 - 553