Audio-video collaborative JND estimation model for multimedia applications

被引:0
|
作者
Sheng, Ning [1 ]
Yin, Haibing [1 ,2 ]
Wang, Hongkui [1 ,2 ]
Mo, Longbin [1 ]
Liu, Yichen [1 ]
Huang, Xiaofeng [1 ]
Lin, Jucai [3 ]
Tang, Xianghong [1 ]
机构
[1] Hangzhou Dianzi Univ, Hangzhou 310000, Zhejiang, Peoples R China
[2] Hangzhou Dianzi Univ, Lishui Res Inst, Lishui 323000, Zhejiang, Peoples R China
[3] Zhejiang Dahua Technol Co Ltd, Hangzhou 310000, Zhejiang, Peoples R China
关键词
Just noticeable distortion (JND); Human visual system (HVS); Multimodal perception; Visual attention; Audio features; NOTICEABLE DISTORTION; LOUDNESS;
D O I
10.1016/j.jvcir.2024.104254
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the rapid development of the Internet and multimedia technologies, multimedia applications integrating audio and video are becoming increasingly prevalent in both everyday life and professional environments. A critical challenge is to significantly enhance compression efficiency and bandwidth utilization while maintaining high-quality user experiences. To address this challenge, the Just Noticeable Distortion (JND) estimation model, which leverages the perceptual characteristics of the Human Visual System (HVS), is widely used in image and video coding for improved data compression. However, human visual perception is an integrative process that involves both visual and auditory stimuli. Therefore, this paper investigates the influence of audio signals on visual perception and presents a collaborative audio-video JND estimation model tailored for multimedia applications. Specifically, we characterize audio loudness, duration, and energy as temporal perceptual features, while assigning the audio saliency superimposed on the image plane as the spatial perceptual feature. An audio JND adjustment factor is then designed using a segmentation function. Finally, the proposed model combines the video-based JND model with the audio JND adjustment factor to form the audio-video collaborative JND estimation model. Compared with existing JND models, the model presented in this paper achieves the best subjective quality, with an average PSNR value of 26.97 dB. The experimental results confirm that audio significantly impacts human visual perception. The proposed audio- video collaborative JND model effectively enhances the accuracy of JND estimation for multimedia data, thereby improving compression efficiency and maintaining high-quality user experiences.
引用
下载
收藏
页数:12
相关论文
共 50 条
  • [41] Audio-video feature correlation:: Faces and speech
    Durand, G
    Montacié, C
    Caraty, MJ
    Faudemay, P
    MULTIMEDIA STORAGE AND ARCHIVING SYSTEMS IV, 1999, 3846 : 102 - 112
  • [42] MODERN AUDIO-VIDEO MEANS AT EXHIBITIONS - REVIEW
    GOSUDAREV, VK
    PETELIN, VG
    KHROMOV, LN
    NAUCHNO-TEKHNICHESKAYA INFORMATSIYA SERIYA 1-ORGANIZATSIYA I METODIKA INFORMATSIONNOI RABOTY, 1983, (04): : 11 - 15
  • [43] ALife for Real and Virtual Audio-Video Performances
    Pagliarini, Luigi
    Lund, Henrik Hautop
    PROCEEDINGS OF INTERNATIONAL CONFERENCE ON ARTIFICIAL LIFE AND ROBOTICS (ICAROB 2014), 2014, : 5 - 9
  • [44] JOINT AUDIO-VIDEO DRIVEN FACIAL ANIMATION
    Chen, Xin
    Cao, Chen
    Xue, Zehao
    Chu, Wei
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 3046 - 3050
  • [45] ALife for Real and Virtual Audio-Video Performances
    Pagliarini, Luigi
    Lund, Henrik Hautop
    JOURNAL OF ROBOTICS NETWORKING AND ARTIFICIAL LIFE, 2014, 1 (01): : 33 - 38
  • [46] An Audio-Video Feedback Platform for Radiation Therapy
    Chiu, T.
    Liu, H.
    Brenner, M.
    Dwyer, J.
    Yang, M.
    Jiang, S.
    Gu, X.
    MEDICAL PHYSICS, 2017, 44 (06)
  • [47] Audio-video people recognition system for an intelligent environment
    Anzalone, Salvatore M.
    Menegatti, Emanuele
    Pagello, Enrico
    Yoshikawa, Yuichiro
    Ishiguro, Hiroshi
    Chella, Antonio
    4TH INTERNATIONAL CONFERENCE ON HUMAN SYSTEM INTERACTION (HSI 2011), 2011, : 237 - 244
  • [48] Audio-video switching system by infrared remote control
    Ru, Guo-bao
    Qin, Dan-qing
    Zhang, Bing-de
    Wuhan Daxue Xuebao/Journal of Wuhan University, 1999, 45 (03): : 371 - 373
  • [49] Frame estimation for restoring audio-video synchronization using parallelized quadratic frame interpolation
    Aly, SG
    Youssef, A
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, VOLS I-V, 2000, : 2333 - 2338
  • [50] PCM-MULTIPLEXED AUDIO IN A LARGE AUDIO-VIDEO ROUTING SWITCHER
    BUTLER, RJ
    SMPTE JOURNAL, 1976, 85 (11): : 875 - 877