ASVFI: AUDIO-DRIVEN SPEAKER VIDEO FRAME INTERPOLATION

被引：0

作者：

Wang, Qianrui ^{[1
]}

Li, Dengshi ^{[1
]}

Liao, Liang ^{[2
]}

Song, Hao ^{[1
]}

Li, Wei ^{[1
]}

Xiao, Jing ^{[3
]}

机构：

[1] Jianghan Univ, Sch Artificial Intelligence, Wuhan, Peoples R China

[2] Nanyang Technol Univ, Sch Comp Sci & Engn, Singapore, Singapore

[3] Wuhan Univ, Natl Engn Res Ctr Multimedia Software, Wuhan, Peoples R China

来源：

2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP | 2023年

关键词：

Speaker video; video frame interpolation; audio;

D O I：

10.1109/ICIP49359.2023.10222345

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Due to limited data transmission, the video frame rate is low during the online conference, severely affecting user experience. Video frame interpolation can solve the problem by interpolating intermediate frames to increase the video frame rate. Generally, most existing video frame interpolation methods are based on the linear motion assumption. However, the mouth motion is nonlinear, and these methods can not generate superior intermediate frames in speaker video. Considering the strong correlation between mouth shape and vocalization, a new method is proposed, named Audio-driven Speaker Video Frame Interpolation(ASVFI). First, we extract the audio feature from Audio Net(ANet). Second, we use Video Net(VNet) encoder to extract the video feature. Finally, we fuse the audio and video features by AVFusion and decode out the intermediate frame in the VNet decoder. The experimental results show that the PSNR is nearly 0.13dB higher than the baseline of interpolating one frame. When interpolating seven frames, the PSNR is 0.33dB higher than the baseline.

引用

页码：3200 / 3204

页数：5

共 50 条

[21] Semi-supervised audio-driven TV-news speaker diarization using deep neural embeddings
Tsipas, Nikolaos
Vrysis, Lazaros
Konstantoudakis, Konstantinos
Dimoulas, Charalampos
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2020, 148 (06): : 3751 - 3761
[22] Touch the Sound: Audio-Driven Tactile Feedback for Audio Mixing Applications
Merchel, Sebastian
Altinsoy, M. Ercan
Stamm, Maik
JOURNAL OF THE AUDIO ENGINEERING SOCIETY, 2012, 60 (1-2): : 47 - 53
[23] Multi-Task Audio-Driven Facial Animation
Kim, Youngsoo
An, Shounan
Jo, Youngbak
Park, Seungje
Kang, Shindong
Oh, Insoo
Kim, Duke Donghyun
SIGGRAPH '19 - ACM SIGGRAPH 2019 POSTERS, 2019,
[24] Audio-Driven Deformation Flow for Effective Lip Reading
Feng, Dalu
Yang, Shuang
Shan, Shiguang
Chen, Xilin
2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 274 - 280
[25] Audio-driven human body motion analysis and synthesis
Ofli, F.
Canton-Ferrer, C.
Tilmanne, J.
Demir, Y.
Bozkurt, E.
Yemez, Y.
Erzin, E.
Tekalp, A. M.
2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 2233 - +
[26] IBVC: Interpolation-driven B-frame video compression
Xu, Chenming
Liu, Meiqin
Yao, Chao
Lin, Weisi
Zhao, Yao
PATTERN RECOGNITION, 2024, 153
[27] VisemeNet: Audio-Driven Animator-Centric Speech Animation
Zhou, Yang
Xu, Zhan
Landreth, Chris
Kalogerakis, Evangelos
Maji, Subhransu
Singh, Karan
ACM TRANSACTIONS ON GRAPHICS, 2018, 37 (04):
[28] Audio-Driven Robot Upper-Body Motion Synthesis
Ondras, Jan
Celiktutan, Oya
Bremner, Paul
Gunes, Hatice
IEEE TRANSACTIONS ON CYBERNETICS, 2021, 51 (11) : 5445 - 5454
[29] PhaseNet for Video Frame Interpolation
Meyer, Simone
Djelouah, Abdelaziz
McWilliams, Brian
Sorkine-Hornung, Alexander
Gross, Markus
Schroers, Christopher
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 498 - 507
[30] Blurry Video Frame Interpolation
Shen, Wang
Bao, Wenbo
Zhai, Guangtao
Chen, Li
Min, Xiongkuo
Gao, Zhiyong
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 5113 - 5122

← 1 2 3 4 5 →