Scalable multimodal approach for face generation and super-resolution using a conditional diffusion model

被引:0
|
作者
Ahmed Abotaleb [1 ]
Mohamed W. Fakhr [1 ]
Mohamed Zaki [2 ]
机构
[1] Arab Academy for Science Technology and Maritime Transport,Computer Engineering Department
[2] Al-Azhar University,College of Engineering
关键词
Scalable multimodal approach; Speech conditioned face generation; Speech conditioned face super-resolution; Diffusion probabilistic models; Speaker embeddings;
D O I
10.1038/s41598-024-76407-9
中图分类号
学科分类号
摘要
Multimodal Conditioned face image generation and face super-resolution are significant areas of research. To achieve optimal results, this paper utilizes diffusion models as the primary engine for these tasks. This paper presents two main contributions: (1) “Speaking the Language of Faces” (SLF): a flexible, modular, fusion-less and architecturally simple multimodal system. (2) A Scalability scheme and a sensitivity analysis which can assist practitioners in system parameter estimation and feature selection. SLF consists of two main components: a feature vector generator (encoder), and an image generator (decoder) utilizing a conditional diffusion model. SLF can accept various inputs, including low-resolution images, speech signals, person attributes (age, gender, ethnicity), or any combination of these. Moreover, Scalability based on conditional scale values is utilized. The implementation of SLF has confirmed its versatility (e.g., speech to face image generation, conditioned face super-resolution). We trained multiple system versions to conduct a sensitivity analysis and to determine the influence of each individual feature on the output image. Consequently, speaker embeddings have proven to be sufficient audio features for our task. It was also found that the effects of audio signals are profound and are more pronounced than those of the low resolution images (8 × 8), whose effects are still significant. The effect of gender, ethnicity and age were found to be moderate. On another note, conditional scale values significantly impact the system’s behavior and performance.
引用
收藏
相关论文
共 50 条
  • [31] DisC-Diff: Disentangled Conditional Diffusion Model for Multi-contrast MRI Super-Resolution
    Mao, Ye
    Jiang, Lan
    Chen, Xi
    Li, Chao
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT X, 2023, 14229 : 387 - 397
  • [32] Super-Resolution Benefit for Face Recognition
    Hu, Shuowen
    Maschal, Robert
    Young, S. Susan
    Hong, Tsai Hong
    Phillips, Jonathon P.
    SENSING TECHNOLOGIES FOR GLOBAL HEALTH, MILITARY MEDICINE, DISASTER RESPONSE, AND ENVIRONMENTAL MONITORING AND BIOMETRIC TECHNOLOGY FOR HUMAN IDENTIFICATION VIII, 2011, 8029
  • [33] Reference Based Face Super-Resolution
    Liu, Zhi-Song
    Siu, Wan-Chi
    Chan, Yui-Lam
    IEEE ACCESS, 2019, 7 : 129112 - 129126
  • [34] A DIRECTIONAL SHOCK DIFFUSION APPROACH TO SINGLE IMAGE SUPER-RESOLUTION
    Zhou, Zuofeng
    Fan, Guoliang
    2013 20TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP 2013), 2013, : 844 - 848
  • [35] Face super-resolution using sparse representation with position weights
    Lan, Chengdong
    Chen, Liang
    Lu, Tao
    Wuhan Daxue Xuebao (Xinxi Kexue Ban)/Geomatics and Information Science of Wuhan University, 2013, 38 (01): : 27 - 30
  • [36] Face Hallucination Using Cascaded Super-Resolution and Identity Priors
    Grm, Klemen
    Scheirer, Walter J.
    Struc, Vitomir
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 (01) : 2150 - 2165
  • [37] Super-Resolution Method of Face Image using Capsule Network
    Hikichi I.
    Hara S.
    Motoki M.
    1600, Institute of Electrical Engineers of Japan (140): : 1270 - 1277
  • [38] Super-resolution face view synthesis using a mobile face capture system
    Figueroa-Villanueva, Miguel A.
    Stockman, George C.
    2006 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP 2006, PROCEEDINGS, 2006, : 2725 - +
  • [39] SUPER-RESOLUTION FOR INCONSISTENT SCALABLE VIDEO STREAMING
    Mahfoodh, Abo-Talib
    Mukherjee, Debargha
    Radha, Hayder
    2015 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2015, : 3019 - 3023
  • [40] A Face Structure Attention Network for Face Super-Resolution
    Li, Chengjie
    Xiao, Nanfeng
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 75 - 81