ASVFI: AUDIO-DRIVEN SPEAKER VIDEO FRAME INTERPOLATION

被引:0
|
作者
Wang, Qianrui [1 ]
Li, Dengshi [1 ]
Liao, Liang [2 ]
Song, Hao [1 ]
Li, Wei [1 ]
Xiao, Jing [3 ]
机构
[1] Jianghan Univ, Sch Artificial Intelligence, Wuhan, Peoples R China
[2] Nanyang Technol Univ, Sch Comp Sci & Engn, Singapore, Singapore
[3] Wuhan Univ, Natl Engn Res Ctr Multimedia Software, Wuhan, Peoples R China
关键词
Speaker video; video frame interpolation; audio;
D O I
10.1109/ICIP49359.2023.10222345
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Due to limited data transmission, the video frame rate is low during the online conference, severely affecting user experience. Video frame interpolation can solve the problem by interpolating intermediate frames to increase the video frame rate. Generally, most existing video frame interpolation methods are based on the linear motion assumption. However, the mouth motion is nonlinear, and these methods can not generate superior intermediate frames in speaker video. Considering the strong correlation between mouth shape and vocalization, a new method is proposed, named Audio-driven Speaker Video Frame Interpolation(ASVFI). First, we extract the audio feature from Audio Net(ANet). Second, we use Video Net(VNet) encoder to extract the video feature. Finally, we fuse the audio and video features by AVFusion and decode out the intermediate frame in the VNet decoder. The experimental results show that the PSNR is nearly 0.13dB higher than the baseline of interpolating one frame. When interpolating seven frames, the PSNR is 0.33dB higher than the baseline.
引用
收藏
页码:3200 / 3204
页数:5
相关论文
共 50 条
  • [1] Audio-Driven Talking Video Frame Restoration
    Cheng, Harry
    Guo, Yangyang
    Yin, Jianhua
    Chen, Haonan
    Wang, Jiafang
    Nie, Liqiang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 4110 - 4122
  • [2] Photorealistic Audio-driven Video Portraits
    Wen, Xin
    Wang, Miao
    Richardt, Christian
    Chen, Ze-Yin
    Hu, Shi-Min
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2020, 26 (12) : 3457 - 3466
  • [3] Audio-Driven Emotional Video Portraits
    Ji, Xinya
    Zhou, Hang
    Wang, Kaisiyuan
    Wu, Wayne
    Loy, Chen Change
    Cao, Xun
    Xu, Feng
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 14075 - 14084
  • [4] SVMFI: speaker video multi-frame interpolation with the guidance of audio
    Wang, Qianrui
    Li, Dengshi
    Gao, Yu
    Chen, Aolei
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (40) : 88411 - 88428
  • [5] Audio-driven Talking Face Video Generation with Emotion
    Liang, Jiadong
    Lu, Feng
    2024 IEEE CONFERENCE ON VIRTUAL REALITY AND 3D USER INTERFACES ABSTRACTS AND WORKSHOPS, VRW 2024, 2024, : 863 - 864
  • [6] Audio-Driven Co-Speech Gesture Video Generation
    Liu, Xian
    Wu, Qianyi
    Zhou, Hang
    Du, Yuanqi
    Wu, Wayne
    Lin, Dahua
    Liu, Ziwei
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [7] Multimodal Semantic Communication for Generative Audio-Driven Video Conferencing
    Tong, Haonan
    Li, Haopeng
    Du, Hongyang
    Yang, Zhaohui
    Yin, Changchuan
    Niyato, Dusit
    IEEE WIRELESS COMMUNICATIONS LETTERS, 2025, 14 (01) : 93 - 97
  • [8] Audio-driven Neural Gesture Reenactment with Video Motion Graphs
    Zhou, Yang
    Yang, Jimei
    Li, Dingzeyu
    Saito, Jun
    Aneja, Deepali
    Kalogerakis, Evangelos
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 3408 - 3418
  • [9] An audio-driven dancing avatar
    Ofli, Ferda
    Demir, Yasemin
    Yemez, Yucel
    Erzin, Engin
    Tekalp, A. Murat
    Balci, Koray
    Kizoglu, Idil
    Akarun, Lale
    Canton-Ferrer, Cristian
    Tilmanne, Joelle
    Bozkurt, Elif
    Erdem, A. Tanju
    JOURNAL ON MULTIMODAL USER INTERFACES, 2008, 2 (02) : 93 - 103
  • [10] PADVG: A Simple Baseline of Active Protection for Audio-Driven Video Generation
    Liu, Huan
    Liu, Xiaolong
    Tan, Zichang
    Li, Xiaolong
    Zhao, Yao
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (06)