ASVFI: AUDIO-DRIVEN SPEAKER VIDEO FRAME INTERPOLATION

被引:0
|
作者
Wang, Qianrui [1 ]
Li, Dengshi [1 ]
Liao, Liang [2 ]
Song, Hao [1 ]
Li, Wei [1 ]
Xiao, Jing [3 ]
机构
[1] Jianghan Univ, Sch Artificial Intelligence, Wuhan, Peoples R China
[2] Nanyang Technol Univ, Sch Comp Sci & Engn, Singapore, Singapore
[3] Wuhan Univ, Natl Engn Res Ctr Multimedia Software, Wuhan, Peoples R China
关键词
Speaker video; video frame interpolation; audio;
D O I
10.1109/ICIP49359.2023.10222345
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Due to limited data transmission, the video frame rate is low during the online conference, severely affecting user experience. Video frame interpolation can solve the problem by interpolating intermediate frames to increase the video frame rate. Generally, most existing video frame interpolation methods are based on the linear motion assumption. However, the mouth motion is nonlinear, and these methods can not generate superior intermediate frames in speaker video. Considering the strong correlation between mouth shape and vocalization, a new method is proposed, named Audio-driven Speaker Video Frame Interpolation(ASVFI). First, we extract the audio feature from Audio Net(ANet). Second, we use Video Net(VNet) encoder to extract the video feature. Finally, we fuse the audio and video features by AVFusion and decode out the intermediate frame in the VNet decoder. The experimental results show that the PSNR is nearly 0.13dB higher than the baseline of interpolating one frame. When interpolating seven frames, the PSNR is 0.33dB higher than the baseline.
引用
收藏
页码:3200 / 3204
页数:5
相关论文
共 50 条
  • [41] EmoFace: Audio-driven Emotional 3D Face Animation
    Liu, Chang
    Lin, Qunfen
    Zeng, Zijiao
    Pan, Ye
    2024 IEEE CONFERENCE ON VIRTUAL REALITY AND 3D USER INTERFACES, VR 2024, 2024, : 387 - 397
  • [42] DiffTalk: Crafting Diffusion Models for Generalized Audio-Driven Portraits Animation
    Shen, Shuai
    Zhao, Wenliang
    Meng, Zibin
    Li, Wanhua
    Zhu, Zheng
    Zhou, Jie
    Lu, Jiwen
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 1982 - 1991
  • [43] Speaker tracking audio-video system
    Cetnarowicz, Damian
    Dabrowski, Adam
    2016 SIGNAL PROCESSING: ALGORITHMS, ARCHITECTURES, ARRANGEMENTS, AND APPLICATIONS (SPA), 2016, : 230 - 233
  • [44] Listen, Denoise, Action! Audio-Driven Motion Synthesis with Diffusion Models
    Alexanderson, Simon
    Nagy, Rajmund
    Beskow, Jonas
    Henter, Gustav Eje
    ACM TRANSACTIONS ON GRAPHICS, 2023, 42 (04):
  • [45] Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation
    Gan, Yuan
    Yang, Zongxin
    Yue, Xihang
    Sun, Lingyun
    Yang, Yi
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 22577 - 22588
  • [46] FaceTalk: Audio-Driven Motion Diffusion for Neural Parametric Head Models
    Aneja, Shivangi
    Thies, Justus
    Dail, Angela
    Niessner, Matthias
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 21263 - 21273
  • [47] Audio-Driven Lips and Expression on 3D Human Face
    Ma, Le
    Ma, Zhihao
    Meng, Weiliang
    Xu, Shibiao
    Zhang, Xiaopeng
    ADVANCES IN COMPUTER GRAPHICS, CGI 2023, PT II, 2024, 14496 : 15 - 26
  • [48] Emotional Semantic Neural Radiance Fields for Audio-Driven Talking Head
    Lin, Haodong
    Wu, Zhonghao
    Zhang, Zhenyu
    Ma, Chao
    Yang, Xiaokang
    ARTIFICIAL INTELLIGENCE, CICAI 2022, PT II, 2022, 13605 : 532 - 544
  • [49] Softmax Splatting for Video Frame Interpolation
    Niklaus, Simon
    Liu, Feng
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 5436 - 5445
  • [50] Exploring Discontinuity for Video Frame Interpolation
    Lee, Sangjin
    Lee, Hyeongmin
    Shin, Chajin
    Son, Hanbin
    Lee, Sangyoun
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 9791 - 9800