Leveraging the Latent Diffusion Models for Offline Facial Multiple Appropriate Reactions Generation

被引:4
|
作者
Yu, Jun [1 ]
Zhao, Ji [1 ]
Xie, Guochen [1 ]
Chen, Fengxin [2 ]
Yu, Ye [2 ]
Peng, Liang [3 ]
Li, Minglei [3 ]
Dai, Zonghong [3 ]
机构
[1] Univ Sci & Technol China, Hefei, Peoples R China
[2] Hefei Univ Technol, Hefei, Peoples R China
[3] Huawei Cloud, Shenzhen, Peoples R China
关键词
Listener Reaction; Offline Reaction Generation; Diffusion Model;
D O I
10.1145/3581783.3612865
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Offline Multiple Appropriate Facial Reaction Generation (OMAFRG) aims to predict the reaction of different listeners given a speaker, which is useful in the senario of human-computer interaction and social media analysis. In recent years, the Offline Facial Reactions Generation (OFRG) task has been explored in different ways. However, most studies only focus on the deterministic reaction of the listeners. The research of the non-deterministic (i.e. OMAFRG) always lacks of sufficient attention and the results are far from satisfactory. Compared with the deterministic OFRG tasks, the OMAFRG task is closer to the true circumstance but corresponds to higher difficulty for its requirement of modeling stochasticity and context. In this paper, we propose a new model named FRDiff to tackle this issue. Our model is developed based on the diffusion model architecture with some modification to enhance its ability of aggregating the context features. And the inherent property of stochasticity in diffusion model enables our model to generate multiple reactions. We conduct experiments on the datasets provided by the ACM Multimedia REACT2023 and obtain the second place on the board, which demonstrates the effectiveness of our method.
引用
收藏
页码:9561 / 9565
页数:5
相关论文
共 27 条
  • [1] Vector Quantized Diffusion Models for Multiple Appropriate Reactions Generation
    Nguyen, Minh-Duc
    Yang, Hyung-Jeong
    Ho, Ngoc-Huynh
    Kim, Soo-Hyung
    Kim, Seungwon
    Shin, Ji-Eun
    2024 IEEE 18TH INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION, FG 2024, 2024,
  • [2] Dance2Music-Diffusion: leveraging latent diffusion models for music generation from dance videos
    Zhang, Chaoyang
    Hua, Yan
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2024, 2024 (01):
  • [3] Brain Imaging Generation with Latent Diffusion Models
    Pinaya, Walter H. L.
    Tudosiu, Petru-Daniel
    Dafflon, Jessica
    Da Costa, Pedro F.
    Fernandez, Virginia
    Nachev, Parashkev
    Ourselin, Sebastien
    Cardoso, M. Jorge
    DEEP GENERATIVE MODELS, DGM4MICCAI 2022, 2022, 13609 : 117 - 126
  • [4] BEAMER: Behavioral Encoder to Generate Multiple Appropriate Facial Reactions
    Hoque, Ximi
    Mann, Adamay
    Sharma, Gulshan
    Dhall, Abhinav
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 9536 - 9540
  • [5] Medical Image Generation based on Latent Diffusion Models
    Song, Wenbo
    Jiang, Yan
    Fang, Yin
    Cao, Xinyu
    Wu, Peiyan
    Xing, Hanshuo
    Wu, Xinglong
    2023 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE INNOVATION, ICAII 2023, 2023, : 89 - 93
  • [6] GENERATION OR REPLICATION: AUSCULTATING AUDIO LATENT DIFFUSION MODELS
    Bralios, Dimitrios
    Wichern, Gordon
    Germain, Francois G.
    Pan, Zexu
    Khurana, Sameer
    Hori, Chiori
    Le Roux, Jonathan
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 1156 - 1160
  • [7] REACT 2024: the Second Multiple Appropriate Facial Reaction Generation Challenge
    Song, Siyang
    Spitale, Micol
    Luo, Cheng
    Palmero, Cristina
    Barquero, German
    Zhu, Hengde
    Escalera, Sergio
    Valstar, Michel
    Baur, Tobias
    Ringeval, Fabien
    Andre, Elisabeth
    Gunes, Hatice
    2024 IEEE 18TH INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION, FG 2024, 2024,
  • [8] Unsupervised Controllable Generation of Diffusion Models with Latent Variables in VAEs
    Kim, Minju
    Kim, Seonggyeom
    Chae, Dong-Kyu
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2024, PT 3, 2025, 14852 : 495 - 504
  • [9] PERIPHERAL BLOOD CELL IMAGING GENERATION WITH LATENT DIFFUSION MODELS
    Moncada, Yefry
    Merino, Anna
    Rodellar, Jose
    Caicedo, Alexander
    Alferez, Santiago
    INTERNATIONAL JOURNAL OF LABORATORY HEMATOLOGY, 2024, 46 : 70 - 70
  • [10] REACT2023: The First Multiple Appropriate Facial Reaction Generation Challenge
    Song, Siyang
    Spitale, Micol
    Luo, Cheng
    Barquero, German
    Palmero, Cristina
    Escalera, Sergio
    Valstar, Michel
    Baur, Tobias
    Ringeval, Fabien
    Andre, Elisabeth
    Gunes, Hatice
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 9620 - 9624