Exploiting enhanced and robust RGB-D face representation via progressive multi-modal learning

被引:1
|
作者
Zhu, Yizhe [1 ,2 ]
Gao, Jialin [1 ,2 ]
Wu, Tianshu [2 ]
Liu, Qiong [2 ]
Zhou, Xi [2 ]
机构
[1] Shanghai Jiao Tong Univ, Cooperat Medianet Innovat Ctr, Shanghai 200240, Peoples R China
[2] CloudWalk Technol, Shanghai 201203, Peoples R China
关键词
RGB-D face recognition; Multi-modal fusion; Depth enhancement; Multi-head-attention mechanism; Incomplete modal data; ATTENTION;
D O I
10.1016/j.patrec.2022.12.027
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Existing RGB-based 2D face recognition approaches are sensitive to facial variations, posture, occlusions, and illumination. Current depth-based methods have been proved to alleviate the above sensitivity by introducing geometric information but rely heavily on high-quality depth from high-cost RGB-D cameras. To this end, we propose a Progressive Multi-modal Fusion framework to exploit enhanced and robust face representation for RGB-D facial recognition based on low-cost RGB-D cameras, which also deals with in-complete RGB-D modal data. Due to the defects such as holes caused by low-cost cameras, we first design a depth enhancement module to refine the low-quality depth and correct depth inaccuracies. Then, we extract and aggregate augmented feature maps of RGB and depth modality step-by-step. Subsequently, the masked modeling scheme and iterative inter-modal feature interaction module aim to fully exploit the implicit relations among these two modalities. We perform comprehensive experiments to verify the superior performance and robustness of the proposed solution over other FR approaches on four chal-lenging benchmark databases. (c) 2022 Elsevier B.V. All rights reserved.
引用
收藏
页码:38 / 45
页数:8
相关论文
共 50 条
  • [21] Progressive Guided Fusion Network With Multi-Modal and Multi-Scale Attention for RGB-D Salient Object Detection
    Wu, Jiajia
    Han, Guangliang
    Wang, Haining
    Yang, Hang
    Li, Qingqing
    Liu, Dongxu
    Ye, Fangjian
    Liu, Peixun
    IEEE ACCESS, 2021, 9 : 150608 - 150622
  • [22] Multi-modal interactive attention and dual progressive decoding network for RGB-D/T salient object detection
    Liang, Yanhua
    Qin, Guihe
    Sun, Minghui
    Qin, Jun
    Yan, Jie
    Zhang, Zhonghan
    NEUROCOMPUTING, 2022, 490 : 132 - 145
  • [23] Lightweight Multi-modal Representation Learning for RGB Salient Object Detection
    Xiao, Yun
    Huang, Yameng
    Li, Chenglong
    Liu, Lei
    Zhou, Aiwu
    Tang, Jin
    COGNITIVE COMPUTATION, 2023, 15 (06) : 1868 - 1883
  • [24] Lightweight Multi-modal Representation Learning for RGB Salient Object Detection
    Yun Xiao
    Yameng Huang
    Chenglong Li
    Lei Liu
    Aiwu Zhou
    Jin Tang
    Cognitive Computation, 2023, 15 : 1868 - 1883
  • [25] RGB-D Salient Object Detection Based on Multi-Modal Feature Interaction
    Gao, Yue
    Dai, Meng
    Zhang, Qing
    Computer Engineering and Applications, 2024, 60 (02) : 211 - 220
  • [26] A Multi-Modal, Discriminative and Spatially Invariant CNN for RGB-D Object Labeling
    Asif, Umar
    Bennamoun, Mohammed
    Sohel, Ferdous A.
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (09) : 2051 - 2065
  • [27] Learning a deeply supervised multi-modal RGB-D embedding for semantic scene and object category recognition
    Zaki, Hasan F. M.
    Shafait, Faisal
    Mian, Ajmal
    ROBOTICS AND AUTONOMOUS SYSTEMS, 2017, 92 : 41 - 52
  • [28] Enhanced Topic Modeling with Multi-modal Representation Learning
    Zhang, Duoyi
    Wang, Yue
    Abul Bashar, Md
    Nayak, Richi
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2023, PT I, 2023, 13935 : 393 - 404
  • [29] MMPL-Net: multi-modal prototype learning for one-shot RGB-D segmentation
    Shan, Dexing
    Zhang, Yunzhou
    Liu, Xiaozheng
    Liu, Shitong
    Coleman, Sonya A.
    Kerr, Dermot
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (14): : 10297 - 10310
  • [30] MMPL-Net: multi-modal prototype learning for one-shot RGB-D segmentation
    Dexing Shan
    Yunzhou Zhang
    Xiaozheng Liu
    Shitong Liu
    Sonya A. Coleman
    Dermot Kerr
    Neural Computing and Applications, 2023, 35 : 10297 - 10310