Scalable multimodal approach for face generation and super-resolution using a conditional diffusion model

被引:0
|
作者
Ahmed Abotaleb [1 ]
Mohamed W. Fakhr [1 ]
Mohamed Zaki [2 ]
机构
[1] Arab Academy for Science Technology and Maritime Transport,Computer Engineering Department
[2] Al-Azhar University,College of Engineering
关键词
Scalable multimodal approach; Speech conditioned face generation; Speech conditioned face super-resolution; Diffusion probabilistic models; Speaker embeddings;
D O I
10.1038/s41598-024-76407-9
中图分类号
学科分类号
摘要
Multimodal Conditioned face image generation and face super-resolution are significant areas of research. To achieve optimal results, this paper utilizes diffusion models as the primary engine for these tasks. This paper presents two main contributions: (1) “Speaking the Language of Faces” (SLF): a flexible, modular, fusion-less and architecturally simple multimodal system. (2) A Scalability scheme and a sensitivity analysis which can assist practitioners in system parameter estimation and feature selection. SLF consists of two main components: a feature vector generator (encoder), and an image generator (decoder) utilizing a conditional diffusion model. SLF can accept various inputs, including low-resolution images, speech signals, person attributes (age, gender, ethnicity), or any combination of these. Moreover, Scalability based on conditional scale values is utilized. The implementation of SLF has confirmed its versatility (e.g., speech to face image generation, conditioned face super-resolution). We trained multiple system versions to conduct a sensitivity analysis and to determine the influence of each individual feature on the output image. Consequently, speaker embeddings have proven to be sufficient audio features for our task. It was also found that the effects of audio signals are profound and are more pronounced than those of the low resolution images (8 × 8), whose effects are still significant. The effect of gender, ethnicity and age were found to be moderate. On another note, conditional scale values significantly impact the system’s behavior and performance.
引用
收藏
相关论文
共 50 条
  • [1] Improved conditional diffusion model for image super-resolution
    Wang, Rui
    Zhou, Ningning
    IET IMAGE PROCESSING, 2025, 19 (01)
  • [2] DIFBFSR: BLIND FACE SUPER-RESOLUTION VIA CONDITIONAL DIFFUSION CONTRACTION
    Yu, Wei
    Li, Zonglin
    Liu, Qinglin
    Chen, Yufan
    Zhang, Shengping
    Lin, Jingbo
    COMPUTING AND INFORMATICS, 2024, 43 (02) : 369 - 392
  • [3] Face super-resolution using a hybrid model
    Li, Liu
    Wang, YiDing
    ICSP: 2008 9TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, VOLS 1-5, PROCEEDINGS, 2008, : 1154 - +
  • [4] DiffLense: a conditional diffusion model for super-resolution of gravitational lensing data
    Reddy, Pranath
    Toomey, Michael W.
    Parul, Hanna
    Gleyzer, Sergei
    MACHINE LEARNING-SCIENCE AND TECHNOLOGY, 2024, 5 (03):
  • [5] Efficient Conditional Diffusion Model with Probability Flow Sampling for Image Super-resolution
    Yuan, Yutao
    Yuan, Chun
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 7, 2024, : 6862 - 6870
  • [6] DBSR: Quadratic Conditional Diffusion Model for Blind Cardiac MRI Super-Resolution
    Qiu, Defu
    Cheng, Yuhu
    Wong, Kelvin K. L.
    Zhang, Wenjun
    Yi, Zhang
    Wang, Xuesong
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 11358 - 11371
  • [7] Simultaneous Tri-Modal Medical Image Fusion and Super-Resolution Using Conditional Diffusion Model
    Xu, Yushen
    Li, Xiaosong
    Jie, Yuchan
    Tan, Haishu
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT VII, 2024, 15007 : 635 - 645
  • [8] A Conditional Diffusion Model With Fast Sampling Strategy for Remote Sensing Image Super-Resolution
    Meng, Fanen
    Chen, Yijun
    Jing, Haoyu
    Zhang, Laifu
    Yan, Yiming
    Ren, Yingchao
    Wu, Sensen
    Feng, Tian
    Liu, Renyi
    Du, Zhenhong
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
  • [9] Enhancing Remote Sensing Image Super-Resolution with Efficient Hybrid Conditional Diffusion Model
    Han, Lintao
    Zhao, Yuchen
    Lv, Hengyi
    Zhang, Yisa
    Liu, Hailong
    Bi, Guoling
    Han, Qing
    REMOTE SENSING, 2023, 15 (13)
  • [10] A FACE SUPER-RESOLUTION APPROACH USING SHAPE SEMANTIC MODE REGULARIZATION
    Lan, Chengdong
    Hu, Ruimin
    Han, Zhen
    Wang, Zhongyuan
    2010 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, 2010, : 2021 - 2024