Exploiting enhanced and robust RGB-D face representation via progressive multi-modal learning

被引:1
|
作者
Zhu, Yizhe [1 ,2 ]
Gao, Jialin [1 ,2 ]
Wu, Tianshu [2 ]
Liu, Qiong [2 ]
Zhou, Xi [2 ]
机构
[1] Shanghai Jiao Tong Univ, Cooperat Medianet Innovat Ctr, Shanghai 200240, Peoples R China
[2] CloudWalk Technol, Shanghai 201203, Peoples R China
关键词
RGB-D face recognition; Multi-modal fusion; Depth enhancement; Multi-head-attention mechanism; Incomplete modal data; ATTENTION;
D O I
10.1016/j.patrec.2022.12.027
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Existing RGB-based 2D face recognition approaches are sensitive to facial variations, posture, occlusions, and illumination. Current depth-based methods have been proved to alleviate the above sensitivity by introducing geometric information but rely heavily on high-quality depth from high-cost RGB-D cameras. To this end, we propose a Progressive Multi-modal Fusion framework to exploit enhanced and robust face representation for RGB-D facial recognition based on low-cost RGB-D cameras, which also deals with in-complete RGB-D modal data. Due to the defects such as holes caused by low-cost cameras, we first design a depth enhancement module to refine the low-quality depth and correct depth inaccuracies. Then, we extract and aggregate augmented feature maps of RGB and depth modality step-by-step. Subsequently, the masked modeling scheme and iterative inter-modal feature interaction module aim to fully exploit the implicit relations among these two modalities. We perform comprehensive experiments to verify the superior performance and robustness of the proposed solution over other FR approaches on four chal-lenging benchmark databases. (c) 2022 Elsevier B.V. All rights reserved.
引用
收藏
页码:38 / 45
页数:8
相关论文
共 50 条
  • [31] Hierarchical multi-modal fusion FCN with attention model for RGB-D tracking
    Jiang, Ming-xin
    Deng, Chao
    Shan, Jing-song
    Wang, Yuan-yuan
    Jia, Yin-jie
    Sun, Xing
    INFORMATION FUSION, 2019, 50 : 1 - 8
  • [32] Eulerian Magnification of Multi-Modal RGB-D Video for Heart Rate Estimation
    Dosso, Yasmina Souley
    Bekele, Amente
    Green, James R.
    2018 IEEE INTERNATIONAL SYMPOSIUM ON MEDICAL MEASUREMENTS AND APPLICATIONS (MEMEA), 2018, : 642 - 647
  • [33] Multi-modal deep learning for Fuji apple detection using RGB-D cameras and their radiometric capabilities
    Gene-Mola, Jordi
    Vilaplana, Veronica
    Rosell-Polo, Joan R.
    Morros, Josep-Ramon
    Ruiz-Hidalgo, Javier
    Gregorio, Eduard
    COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2019, 162 : 689 - 698
  • [34] BMFNet: Bifurcated multi-modal fusion network for RGB-D salient object detection
    Sun, Chenwang
    Zhang, Qing
    Zhuang, Chenyu
    Zhang, Mingqian
    IMAGE AND VISION COMPUTING, 2024, 147
  • [35] MAPNet: Multi-modal attentive pooling network for RGB-D indoor scene classification
    Li, Yabei
    Zhang, Zhang
    Cheng, Yanhua
    Wang, Liang
    Tan, Tieniu
    PATTERN RECOGNITION, 2019, 90 : 436 - 449
  • [36] DMFNet: Deep Multi-Modal Fusion Network for RGB-D Indoor Scene Segmentation
    Yuan, Jianzhong
    Zhou, Wujie
    Luo, Ting
    IEEE ACCESS, 2019, 7 : 169350 - 169358
  • [37] computer catwalk: A multi-modal deep network for the segmentation of RGB-D images of clothes
    Joukovsky, B.
    Hu, P.
    Munteanu, A.
    Electronics Letters, 2020, 56 (09):
  • [38] RGB-D Face Recognition via Learning-based Reconstruction
    Chowdhury, Anurag
    Ghosh, Soumyadeep
    Singh, Richa
    Vatsa, Mayank
    2016 IEEE 8TH INTERNATIONAL CONFERENCE ON BIOMETRICS THEORY, APPLICATIONS AND SYSTEMS (BTAS), 2016,
  • [39] Unified Information Fusion Network for Multi-Modal RGB-D and RGB-T Salient Object Detection
    Gao, Wei
    Liao, Guibiao
    Ma, Siwei
    Li, Ge
    Liang, Yongsheng
    Lin, Weisi
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (04) : 2091 - 2106
  • [40] Multi-modal Network Representation Learning
    Zhang, Chuxu
    Jiang, Meng
    Zhang, Xiangliang
    Ye, Yanfang
    Chawla, Nitesh, V
    KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 3557 - 3558