Pixel-in-Pixel Net: Towards Efficient Facial Landmark Detection in the Wild

被引:69
|
作者
Jin, Haibo [1 ]
Liao, Shengcai [1 ]
Shao, Ling [1 ,2 ]
机构
[1] Incept Inst Artificial Intelligence IIAI, Abu Dhabi, U Arab Emirates
[2] Mohamed Bin Zayed Univ Artificial Intelligence MB, Abu Dhabi, U Arab Emirates
关键词
Facial landmark detection; Pixel-in-pixel regression; Self-training with curriculum; Unsupervised domain adaptation; REPRESENTATION; NETWORK;
D O I
10.1007/s11263-021-01521-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, heatmap regression models have become popular due to their superior performance in locating facial landmarks. However, three major problems still exist among these models: (1) they are computationally expensive; (2) they usually lack explicit constraints on global shapes; (3) domain gaps are commonly present. To address these problems, we propose Pixel-in-Pixel Net (PIPNet) for facial landmark detection. The proposed model is equipped with a novel detection head based on heatmap regression, which conducts score and offset predictions simultaneously on low-resolution feature maps. By doing so, repeated upsampling layers are no longer necessary, enabling the inference time to be largely reduced without sacrificing model accuracy. Besides, a simple but effective neighbor regression module is proposed to enforce local constraints by fusing predictions from neighboring landmarks, which enhances the robustness of the new detection head. To further improve the cross-domain generalization capability of PIPNet, we propose self-training with curriculum. This training strategy is able to mine more reliable pseudo-labels from unlabeled data across domains by starting with an easier task, then gradually increasing the difficulty to provide more precise labels. Extensive experiments demonstrate the superiority of PIPNet, which obtains new state-of-the-art results on three out of six popular benchmarks under the supervised setting. The results on two cross-domain test sets are also consistently improved compared to the baselines. Notably, our lightweight version of PIPNet runs at 35.7 FPS and 200 FPS on CPU and GPU, respectively, while still maintaining a competitive accuracy to state-of-the-art methods. The code of PIPNet is available at https://github.com/jhb86253817/PIPNet.
引用
收藏
页码:3174 / 3194
页数:21
相关论文
共 50 条
  • [1] Pixel-in-Pixel Net: Towards Efficient Facial Landmark Detection in the Wild
    Haibo Jin
    Shengcai Liao
    Ling Shao
    International Journal of Computer Vision, 2021, 129 : 3174 - 3194
  • [2] Does Pixel Value Represent Facial Landmark Well in Heatmap?
    Lan, Xing
    Lyu, Jiayi
    Dong, Kun
    Jiang, Hanyu
    Hu, Qinghao
    Xue, Jian
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (12) : 13016 - 13028
  • [3] Towards pixel-to-pixel deep nucleus detection in microscopy images
    Fuyong Xing
    Yuanpu Xie
    Xiaoshuang Shi
    Pingjun Chen
    Zizhao Zhang
    Lin Yang
    BMC Bioinformatics, 20
  • [4] Facial Feature Detection with Optimal Pixel Reduction SVM
    Nguyen, Minh Hoai
    Perez, Joan
    De la Torre, Fernando
    2008 8TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE & GESTURE RECOGNITION (FG 2008), VOLS 1 AND 2, 2008, : 460 - 465
  • [5] Pixel Difference Networks for Efficient Edge Detection
    Su, Zhuo
    Liu, Wenzhe
    Yu, Zitong
    Hu, Dewen
    Liao, Qing
    Tian, Qi
    Pietikainen, Matti
    Liu, Li
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 5097 - 5107
  • [6] EFFICIENT FACIAL LANDMARK DETECTION FOR EMBEDDED SYSTEMS
    Wu, Ji-Jia
    2024 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS, ICMEW 2024, 2024,
  • [7] Towards pixel-to-pixel deep nucleus detection in microscopy images (vol 20, 472, 2019)
    Xing, Fuyong
    Xie, Yuanpu
    Shi, Xiaoshuang
    Chen, Pingjun
    Zhang, Zizhao
    Yang, Lin
    BMC BIOINFORMATICS, 2019, 20 (01)
  • [8] EFFICIENT DETECTION OF PIXEL-LEVEL ADVERSARIAL ATTACKS
    Shah, Syed Afaq Ali
    Bougre, Moise
    Akhtar, Naveed
    Bennamoun, Mohammed
    Zhang, Liang
    2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 718 - 722
  • [9] Evolution of pixel level snakes towards an efficient hardware implementation
    Vilarino, David Lopez
    Dudek, Piotr
    2007 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-11, 2007, : 2678 - +
  • [10] Learning the Face Shape Models for Facial Landmark Detection in the Wild
    Wu, Yue
    Ji, Qiang
    FACE AND FACIAL EXPRESSION RECOGNITION FROM REAL WORLD VIDEOS, 2015, 8912 : 33 - 45