ASVFI: AUDIO-DRIVEN SPEAKER VIDEO FRAME INTERPOLATION

被引：0

作者：

Wang, Qianrui ^{[1
]}

Li, Dengshi ^{[1
]}

Liao, Liang ^{[2
]}

Song, Hao ^{[1
]}

Li, Wei ^{[1
]}

Xiao, Jing ^{[3
]}

机构：

[1] Jianghan Univ, Sch Artificial Intelligence, Wuhan, Peoples R China

[2] Nanyang Technol Univ, Sch Comp Sci & Engn, Singapore, Singapore

[3] Wuhan Univ, Natl Engn Res Ctr Multimedia Software, Wuhan, Peoples R China

来源：

2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP | 2023年

关键词：

Speaker video; video frame interpolation; audio;

D O I：

10.1109/ICIP49359.2023.10222345

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Due to limited data transmission, the video frame rate is low during the online conference, severely affecting user experience. Video frame interpolation can solve the problem by interpolating intermediate frames to increase the video frame rate. Generally, most existing video frame interpolation methods are based on the linear motion assumption. However, the mouth motion is nonlinear, and these methods can not generate superior intermediate frames in speaker video. Considering the strong correlation between mouth shape and vocalization, a new method is proposed, named Audio-driven Speaker Video Frame Interpolation(ASVFI). First, we extract the audio feature from Audio Net(ANet). Second, we use Video Net(VNet) encoder to extract the video feature. Finally, we fuse the audio and video features by AVFusion and decode out the intermediate frame in the VNet decoder. The experimental results show that the PSNR is nearly 0.13dB higher than the baseline of interpolating one frame. When interpolating seven frames, the PSNR is 0.33dB higher than the baseline.

引用

页码：3200 / 3204

页数：5

共 50 条

[1] Audio-Driven Talking Video Frame Restoration
Cheng, Harry
Guo, Yangyang
Yin, Jianhua
Chen, Haonan
Wang, Jiafang
Nie, Liqiang
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 4110 - 4122
[2] Photorealistic Audio-driven Video Portraits
Wen, Xin
Wang, Miao
Richardt, Christian
Chen, Ze-Yin
Hu, Shi-Min
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2020, 26 (12) : 3457 - 3466
[3] Audio-Driven Emotional Video Portraits
Ji, Xinya
Zhou, Hang
Wang, Kaisiyuan
Wu, Wayne
Loy, Chen Change
Cao, Xun
Xu, Feng
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 14075 - 14084
[4] SVMFI: speaker video multi-frame interpolation with the guidance of audio
Wang, Qianrui
Li, Dengshi
Gao, Yu
Chen, Aolei
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (40) : 88411 - 88428
[5] Audio-driven Talking Face Video Generation with Emotion
Liang, Jiadong
Lu, Feng
2024 IEEE CONFERENCE ON VIRTUAL REALITY AND 3D USER INTERFACES ABSTRACTS AND WORKSHOPS, VRW 2024, 2024, : 863 - 864
[6] Audio-Driven Co-Speech Gesture Video Generation
Liu, Xian
Wu, Qianyi
Zhou, Hang
Du, Yuanqi
Wu, Wayne
Lin, Dahua
Liu, Ziwei
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[7] Multimodal Semantic Communication for Generative Audio-Driven Video Conferencing
Tong, Haonan
Li, Haopeng
Du, Hongyang
Yang, Zhaohui
Yin, Changchuan
Niyato, Dusit
IEEE WIRELESS COMMUNICATIONS LETTERS, 2025, 14 (01) : 93 - 97
[8] Audio-driven Neural Gesture Reenactment with Video Motion Graphs
Zhou, Yang
Yang, Jimei
Li, Dingzeyu
Saito, Jun
Aneja, Deepali
Kalogerakis, Evangelos
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 3408 - 3418
[9] An audio-driven dancing avatar
Ofli, Ferda
Demir, Yasemin
Yemez, Yucel
Erzin, Engin
Tekalp, A. Murat
Balci, Koray
Kizoglu, Idil
Akarun, Lale
Canton-Ferrer, Cristian
Tilmanne, Joelle
Bozkurt, Elif
Erdem, A. Tanju
JOURNAL ON MULTIMODAL USER INTERFACES, 2008, 2 (02) : 93 - 103
[10] PADVG: A Simple Baseline of Active Protection for Audio-Driven Video Generation
Liu, Huan
Liu, Xiaolong
Tan, Zichang
Li, Xiaolong
Zhao, Yao
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (06)

← 1 2 3 4 5 →