ASVFI: AUDIO-DRIVEN SPEAKER VIDEO FRAME INTERPOLATION

被引：0

作者：

Wang, Qianrui ^{[1
]}

Li, Dengshi ^{[1
]}

Liao, Liang ^{[2
]}

Song, Hao ^{[1
]}

Li, Wei ^{[1
]}

Xiao, Jing ^{[3
]}

机构：

[1] Jianghan Univ, Sch Artificial Intelligence, Wuhan, Peoples R China

[2] Nanyang Technol Univ, Sch Comp Sci & Engn, Singapore, Singapore

[3] Wuhan Univ, Natl Engn Res Ctr Multimedia Software, Wuhan, Peoples R China

来源：

2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP | 2023年

关键词：

Speaker video; video frame interpolation; audio;

D O I：

10.1109/ICIP49359.2023.10222345

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Due to limited data transmission, the video frame rate is low during the online conference, severely affecting user experience. Video frame interpolation can solve the problem by interpolating intermediate frames to increase the video frame rate. Generally, most existing video frame interpolation methods are based on the linear motion assumption. However, the mouth motion is nonlinear, and these methods can not generate superior intermediate frames in speaker video. Considering the strong correlation between mouth shape and vocalization, a new method is proposed, named Audio-driven Speaker Video Frame Interpolation(ASVFI). First, we extract the audio feature from Audio Net(ANet). Second, we use Video Net(VNet) encoder to extract the video feature. Finally, we fuse the audio and video features by AVFusion and decode out the intermediate frame in the VNet decoder. The experimental results show that the PSNR is nearly 0.13dB higher than the baseline of interpolating one frame. When interpolating seven frames, the PSNR is 0.33dB higher than the baseline.

引用

页码：3200 / 3204

页数：5

共 50 条

[41] EmoFace: Audio-driven Emotional 3D Face Animation
Liu, Chang
Lin, Qunfen
Zeng, Zijiao
Pan, Ye
2024 IEEE CONFERENCE ON VIRTUAL REALITY AND 3D USER INTERFACES, VR 2024, 2024, : 387 - 397
[42] DiffTalk: Crafting Diffusion Models for Generalized Audio-Driven Portraits Animation
Shen, Shuai
Zhao, Wenliang
Meng, Zibin
Li, Wanhua
Zhu, Zheng
Zhou, Jie
Lu, Jiwen
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 1982 - 1991
[43] Speaker tracking audio-video system
Cetnarowicz, Damian
Dabrowski, Adam
2016 SIGNAL PROCESSING: ALGORITHMS, ARCHITECTURES, ARRANGEMENTS, AND APPLICATIONS (SPA), 2016, : 230 - 233
[44] Listen, Denoise, Action! Audio-Driven Motion Synthesis with Diffusion Models
Alexanderson, Simon
Nagy, Rajmund
Beskow, Jonas
Henter, Gustav Eje
ACM TRANSACTIONS ON GRAPHICS, 2023, 42 (04):
[45] Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation
Gan, Yuan
Yang, Zongxin
Yue, Xihang
Sun, Lingyun
Yang, Yi
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 22577 - 22588
[46] FaceTalk: Audio-Driven Motion Diffusion for Neural Parametric Head Models
Aneja, Shivangi
Thies, Justus
Dail, Angela
Niessner, Matthias
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 21263 - 21273
[47] Audio-Driven Lips and Expression on 3D Human Face
Ma, Le
Ma, Zhihao
Meng, Weiliang
Xu, Shibiao
Zhang, Xiaopeng
ADVANCES IN COMPUTER GRAPHICS, CGI 2023, PT II, 2024, 14496 : 15 - 26
[48] Emotional Semantic Neural Radiance Fields for Audio-Driven Talking Head
Lin, Haodong
Wu, Zhonghao
Zhang, Zhenyu
Ma, Chao
Yang, Xiaokang
ARTIFICIAL INTELLIGENCE, CICAI 2022, PT II, 2022, 13605 : 532 - 544
[49] Softmax Splatting for Video Frame Interpolation
Niklaus, Simon
Liu, Feng
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 5436 - 5445
[50] Exploring Discontinuity for Video Frame Interpolation
Lee, Sangjin
Lee, Hyeongmin
Shin, Chajin
Son, Hanbin
Lee, Sangyoun
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 9791 - 9800

← 1 2 3 4 5 →