ASVFI: AUDIO-DRIVEN SPEAKER VIDEO FRAME INTERPOLATION

被引:0
|
作者
Wang, Qianrui [1 ]
Li, Dengshi [1 ]
Liao, Liang [2 ]
Song, Hao [1 ]
Li, Wei [1 ]
Xiao, Jing [3 ]
机构
[1] Jianghan Univ, Sch Artificial Intelligence, Wuhan, Peoples R China
[2] Nanyang Technol Univ, Sch Comp Sci & Engn, Singapore, Singapore
[3] Wuhan Univ, Natl Engn Res Ctr Multimedia Software, Wuhan, Peoples R China
关键词
Speaker video; video frame interpolation; audio;
D O I
10.1109/ICIP49359.2023.10222345
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Due to limited data transmission, the video frame rate is low during the online conference, severely affecting user experience. Video frame interpolation can solve the problem by interpolating intermediate frames to increase the video frame rate. Generally, most existing video frame interpolation methods are based on the linear motion assumption. However, the mouth motion is nonlinear, and these methods can not generate superior intermediate frames in speaker video. Considering the strong correlation between mouth shape and vocalization, a new method is proposed, named Audio-driven Speaker Video Frame Interpolation(ASVFI). First, we extract the audio feature from Audio Net(ANet). Second, we use Video Net(VNet) encoder to extract the video feature. Finally, we fuse the audio and video features by AVFusion and decode out the intermediate frame in the VNet decoder. The experimental results show that the PSNR is nearly 0.13dB higher than the baseline of interpolating one frame. When interpolating seven frames, the PSNR is 0.33dB higher than the baseline.
引用
收藏
页码:3200 / 3204
页数:5
相关论文
共 50 条
  • [21] Semi-supervised audio-driven TV-news speaker diarization using deep neural embeddings
    Tsipas, Nikolaos
    Vrysis, Lazaros
    Konstantoudakis, Konstantinos
    Dimoulas, Charalampos
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2020, 148 (06): : 3751 - 3761
  • [22] Touch the Sound: Audio-Driven Tactile Feedback for Audio Mixing Applications
    Merchel, Sebastian
    Altinsoy, M. Ercan
    Stamm, Maik
    JOURNAL OF THE AUDIO ENGINEERING SOCIETY, 2012, 60 (1-2): : 47 - 53
  • [23] Multi-Task Audio-Driven Facial Animation
    Kim, Youngsoo
    An, Shounan
    Jo, Youngbak
    Park, Seungje
    Kang, Shindong
    Oh, Insoo
    Kim, Duke Donghyun
    SIGGRAPH '19 - ACM SIGGRAPH 2019 POSTERS, 2019,
  • [24] Audio-Driven Deformation Flow for Effective Lip Reading
    Feng, Dalu
    Yang, Shuang
    Shan, Shiguang
    Chen, Xilin
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 274 - 280
  • [25] Audio-driven human body motion analysis and synthesis
    Ofli, F.
    Canton-Ferrer, C.
    Tilmanne, J.
    Demir, Y.
    Bozkurt, E.
    Yemez, Y.
    Erzin, E.
    Tekalp, A. M.
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 2233 - +
  • [26] IBVC: Interpolation-driven B-frame video compression
    Xu, Chenming
    Liu, Meiqin
    Yao, Chao
    Lin, Weisi
    Zhao, Yao
    PATTERN RECOGNITION, 2024, 153
  • [27] VisemeNet: Audio-Driven Animator-Centric Speech Animation
    Zhou, Yang
    Xu, Zhan
    Landreth, Chris
    Kalogerakis, Evangelos
    Maji, Subhransu
    Singh, Karan
    ACM TRANSACTIONS ON GRAPHICS, 2018, 37 (04):
  • [28] Audio-Driven Robot Upper-Body Motion Synthesis
    Ondras, Jan
    Celiktutan, Oya
    Bremner, Paul
    Gunes, Hatice
    IEEE TRANSACTIONS ON CYBERNETICS, 2021, 51 (11) : 5445 - 5454
  • [29] PhaseNet for Video Frame Interpolation
    Meyer, Simone
    Djelouah, Abdelaziz
    McWilliams, Brian
    Sorkine-Hornung, Alexander
    Gross, Markus
    Schroers, Christopher
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 498 - 507
  • [30] Blurry Video Frame Interpolation
    Shen, Wang
    Bao, Wenbo
    Zhai, Guangtao
    Chen, Li
    Min, Xiongkuo
    Gao, Zhiyong
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 5113 - 5122