Pixel-in-Pixel Net: Towards Efficient Facial Landmark Detection in the Wild

被引:69
|
作者
Jin, Haibo [1 ]
Liao, Shengcai [1 ]
Shao, Ling [1 ,2 ]
机构
[1] Incept Inst Artificial Intelligence IIAI, Abu Dhabi, U Arab Emirates
[2] Mohamed Bin Zayed Univ Artificial Intelligence MB, Abu Dhabi, U Arab Emirates
关键词
Facial landmark detection; Pixel-in-pixel regression; Self-training with curriculum; Unsupervised domain adaptation; REPRESENTATION; NETWORK;
D O I
10.1007/s11263-021-01521-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, heatmap regression models have become popular due to their superior performance in locating facial landmarks. However, three major problems still exist among these models: (1) they are computationally expensive; (2) they usually lack explicit constraints on global shapes; (3) domain gaps are commonly present. To address these problems, we propose Pixel-in-Pixel Net (PIPNet) for facial landmark detection. The proposed model is equipped with a novel detection head based on heatmap regression, which conducts score and offset predictions simultaneously on low-resolution feature maps. By doing so, repeated upsampling layers are no longer necessary, enabling the inference time to be largely reduced without sacrificing model accuracy. Besides, a simple but effective neighbor regression module is proposed to enforce local constraints by fusing predictions from neighboring landmarks, which enhances the robustness of the new detection head. To further improve the cross-domain generalization capability of PIPNet, we propose self-training with curriculum. This training strategy is able to mine more reliable pseudo-labels from unlabeled data across domains by starting with an easier task, then gradually increasing the difficulty to provide more precise labels. Extensive experiments demonstrate the superiority of PIPNet, which obtains new state-of-the-art results on three out of six popular benchmarks under the supervised setting. The results on two cross-domain test sets are also consistently improved compared to the baselines. Notably, our lightweight version of PIPNet runs at 35.7 FPS and 200 FPS on CPU and GPU, respectively, while still maintaining a competitive accuracy to state-of-the-art methods. The code of PIPNet is available at https://github.com/jhb86253817/PIPNet.
引用
收藏
页码:3174 / 3194
页数:21
相关论文
共 50 条
  • [41] Automatic and Efficient Metallic Surface Defect Detection Based on Key Pixel Point Locations
    Yu, Jiahui
    Gao, Hongwei
    Sun, Jian
    Yang, Wei
    Jiang, Yueqiu
    Ju, Zhaojie
    IEEE SENSORS JOURNAL, 2021, 21 (10) : 11476 - 11487
  • [42] Robust facial landmark detection and tracking across poses and expressions for in-the-wild monocular video
    Shuang Liu
    Yongqiang Zhang
    Xiaosong Yang
    Daming Shi
    Jian J.Zhang
    ComputationalVisualMedia, 2017, 3 (01) : 33 - 47
  • [43] Face Detection, Bounding Box Aggregation and Pose Estimation for Robust Facial Landmark Localisation in the Wild
    Feng, Zhen-Hua
    Kittler, Josef
    Awais, Muhammad
    Huber, Patrik
    Wu, Xiao-Jun
    2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2017, : 2106 - 2111
  • [44] Robust facial landmark detection and tracking across poses and expressions for in-the-wild monocular video
    Liu S.
    Zhang Y.
    Yang X.
    Shi D.
    Zhang J.J.
    Zhang, Yongqiang (seekever@foxmail.com), 2017, Tsinghua University Press (03): : 33 - 47
  • [45] A pixel-level grasp detection method based on Efficient Grasp Aware Network
    Xi, Haonan
    Li, Shaodong
    Liu, Xi
    ROBOTICA, 2024, 42 (09) : 3190 - 3210
  • [46] Towards stabilizing facial landmark detection and tracking via hierarchical filtering: A new method
    Jin, Yi
    Guo, Xingyan
    Li, Yidong
    Xing, Junliang
    Tian, Hui
    JOURNAL OF THE FRANKLIN INSTITUTE-ENGINEERING AND APPLIED MATHEMATICS, 2020, 357 (05): : 3019 - 3037
  • [47] Efficient Multi-task based Facial Landmark and Gesture Detection in Monocular Images
    Goenetxea, Jon
    Unzueta, Luis
    Elordi, Unai
    Otaegui, Oihana
    Dornaika, Fadi
    VISAPP: PROCEEDINGS OF THE 16TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS - VOL. 5: VISAPP, 2021, : 680 - 687
  • [48] Latest advancements towards delamination detection in a FCOB assembly using Thermal Pixel (Thixel) array
    Kumar, Akhil
    Schulz, Marcus
    Bader, Volker
    Wahrmann, Markus
    Bauer, Joerg
    May, Daniel
    Puri, Aakash
    Ras, Mohamad Abo
    Wunderle, Bernhard
    2019 25TH INTERNATIONAL WORKSHOP ON THERMAL INVESTIGATIONS OF ICS AND SYSTEMS (THERMINIC 2019), 2019,
  • [49] Efficient tumor detection in medical images using pixel intensity estimation based on nonparametric approach
    Geweid, Gamal G. N.
    Elsisy, M. A.
    Faragallan, Osama S.
    Fazel-Rezai, Reza
    EXPERT SYSTEMS WITH APPLICATIONS, 2019, 120 : 139 - 154
  • [50] MiniCrack: A simple but efficient convolutional neural network for pixel-level narrow crack detection
    Lan, Zhi-Xiong
    Dong, Xue-Mei
    COMPUTERS IN INDUSTRY, 2022, 141