Scalable multimodal approach for face generation and super-resolution using a conditional diffusion model

被引:0
|
作者
Ahmed Abotaleb [1 ]
Mohamed W. Fakhr [1 ]
Mohamed Zaki [2 ]
机构
[1] Arab Academy for Science Technology and Maritime Transport,Computer Engineering Department
[2] Al-Azhar University,College of Engineering
关键词
Scalable multimodal approach; Speech conditioned face generation; Speech conditioned face super-resolution; Diffusion probabilistic models; Speaker embeddings;
D O I
10.1038/s41598-024-76407-9
中图分类号
学科分类号
摘要
Multimodal Conditioned face image generation and face super-resolution are significant areas of research. To achieve optimal results, this paper utilizes diffusion models as the primary engine for these tasks. This paper presents two main contributions: (1) “Speaking the Language of Faces” (SLF): a flexible, modular, fusion-less and architecturally simple multimodal system. (2) A Scalability scheme and a sensitivity analysis which can assist practitioners in system parameter estimation and feature selection. SLF consists of two main components: a feature vector generator (encoder), and an image generator (decoder) utilizing a conditional diffusion model. SLF can accept various inputs, including low-resolution images, speech signals, person attributes (age, gender, ethnicity), or any combination of these. Moreover, Scalability based on conditional scale values is utilized. The implementation of SLF has confirmed its versatility (e.g., speech to face image generation, conditioned face super-resolution). We trained multiple system versions to conduct a sensitivity analysis and to determine the influence of each individual feature on the output image. Consequently, speaker embeddings have proven to be sufficient audio features for our task. It was also found that the effects of audio signals are profound and are more pronounced than those of the low resolution images (8 × 8), whose effects are still significant. The effect of gender, ethnicity and age were found to be moderate. On another note, conditional scale values significantly impact the system’s behavior and performance.
引用
收藏
相关论文
共 50 条
  • [21] Noise Conditional Flow Model for Learning the Super-Resolution Space
    Kim, Younggeun
    Son, Donghee
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 424 - 432
  • [22] Face Super-Resolution Using Stochastic Differential Equations
    dos Santos, Marcelo
    Laroca, Rayson
    Ribeiro, Rafael O.
    Neves, Joao
    Proenca, Hugo
    Menotti, David
    2022 35TH SIBGRAPI CONFERENCE ON GRAPHICS, PATTERNS AND IMAGES (SIBGRAPI 2022), 2022, : 216 - 221
  • [23] Face Super-Resolution using Coherency Sensitive Hashing
    Choudhury, Anustup
    Segall, Andrew
    DIGITAL PHOTOGRAPHY XI, 2015, 9404
  • [24] CollageNet: Face Super-Resolution Using Reference Images
    Kim, Ji-Soo
    Kim, Chang-Su
    2022 37TH INTERNATIONAL TECHNICAL CONFERENCE ON CIRCUITS/SYSTEMS, COMPUTERS AND COMMUNICATIONS (ITC-CSCC 2022), 2022, : 916 - 917
  • [25] A Practical Approach to Multiple Super-resolution Sprite Generation
    Ye, Getian
    Wang, Yang
    Xu, Jie
    Herman, Gunawan
    Zhang, Bang
    2008 IEEE 10TH WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, VOLS 1 AND 2, 2008, : 73 - +
  • [26] Flexible Style Image Super-Resolution Using Conditional Objective
    Park, Seung Ho
    Moon, Young Su
    Cho, Nam Ik
    IEEE ACCESS, 2022, 10 : 9774 - 9792
  • [27] A Bayesian estimation approach to super-resolution reconstruction for face images
    Huang, Hua
    Fan, Xin
    Qi, Chun
    Zhu, Shihua
    ADVANCES IN MACHINE VISION, IMAGE PROCESSING, AND PATTERN ANALYSIS, 2006, 4153 : 406 - 415
  • [28] A LEARNING APPROACH FOR SINGLE-FRAME FACE SUPER-RESOLUTION
    He, Yu
    Yap, Kim-Hui
    Chau, Lap-Pui
    ISCAS: 2009 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-5, 2009, : 770 - 773
  • [29] Super-Resolution Diffusion Model for Accelerated MRI Reconstruction
    Mirza, Muhammad Usama
    Cukur, Tolga
    2023 31ST SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU, 2023,
  • [30] Image super-resolution using conditional generative adversarial network
    Qiao, Jiaojiao
    Song, Huihui
    Zhang, Kaihua
    Zhang, Xiaolu
    Liu, Qingshan
    IET IMAGE PROCESSING, 2019, 13 (14) : 2673 - 2679