Audio-video collaborative JND estimation model for multimedia applications

被引：0

作者：

Sheng, Ning ^{[1
]}

Yin, Haibing ^{[1
,2
]}

Wang, Hongkui ^{[1
,2
]}

Mo, Longbin ^{[1
]}

Liu, Yichen ^{[1
]}

Huang, Xiaofeng ^{[1
]}

Lin, Jucai ^{[3
]}

Tang, Xianghong ^{[1
]}

机构：

[1] Hangzhou Dianzi Univ, Hangzhou 310000, Zhejiang, Peoples R China

[2] Hangzhou Dianzi Univ, Lishui Res Inst, Lishui 323000, Zhejiang, Peoples R China

[3] Zhejiang Dahua Technol Co Ltd, Hangzhou 310000, Zhejiang, Peoples R China

来源：

JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION | 2024年 / 103卷

关键词：

Just noticeable distortion (JND); Human visual system (HVS); Multimodal perception; Visual attention; Audio features; NOTICEABLE DISTORTION; LOUDNESS;

D O I：

10.1016/j.jvcir.2024.104254

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

With the rapid development of the Internet and multimedia technologies, multimedia applications integrating audio and video are becoming increasingly prevalent in both everyday life and professional environments. A critical challenge is to significantly enhance compression efficiency and bandwidth utilization while maintaining high-quality user experiences. To address this challenge, the Just Noticeable Distortion (JND) estimation model, which leverages the perceptual characteristics of the Human Visual System (HVS), is widely used in image and video coding for improved data compression. However, human visual perception is an integrative process that involves both visual and auditory stimuli. Therefore, this paper investigates the influence of audio signals on visual perception and presents a collaborative audio-video JND estimation model tailored for multimedia applications. Specifically, we characterize audio loudness, duration, and energy as temporal perceptual features, while assigning the audio saliency superimposed on the image plane as the spatial perceptual feature. An audio JND adjustment factor is then designed using a segmentation function. Finally, the proposed model combines the video-based JND model with the audio JND adjustment factor to form the audio-video collaborative JND estimation model. Compared with existing JND models, the model presented in this paper achieves the best subjective quality, with an average PSNR value of 26.97 dB. The experimental results confirm that audio significantly impacts human visual perception. The proposed audio- video collaborative JND model effectively enhances the accuracy of JND estimation for multimedia data, thereby improving compression efficiency and maintaining high-quality user experiences.

引用

下载

页数：12

共 50 条

[41] Audio-video feature correlation:: Faces and speech
Durand, G
Montacié, C
Caraty, MJ
Faudemay, P
MULTIMEDIA STORAGE AND ARCHIVING SYSTEMS IV, 1999, 3846 : 102 - 112
[42] MODERN AUDIO-VIDEO MEANS AT EXHIBITIONS - REVIEW
GOSUDAREV, VK
PETELIN, VG
KHROMOV, LN
NAUCHNO-TEKHNICHESKAYA INFORMATSIYA SERIYA 1-ORGANIZATSIYA I METODIKA INFORMATSIONNOI RABOTY, 1983, (04): : 11 - 15
[43] ALife for Real and Virtual Audio-Video Performances
Pagliarini, Luigi
Lund, Henrik Hautop
PROCEEDINGS OF INTERNATIONAL CONFERENCE ON ARTIFICIAL LIFE AND ROBOTICS (ICAROB 2014), 2014, : 5 - 9
[44] JOINT AUDIO-VIDEO DRIVEN FACIAL ANIMATION
Chen, Xin
Cao, Chen
Xue, Zehao
Chu, Wei
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 3046 - 3050
[45] ALife for Real and Virtual Audio-Video Performances
Pagliarini, Luigi
Lund, Henrik Hautop
JOURNAL OF ROBOTICS NETWORKING AND ARTIFICIAL LIFE, 2014, 1 (01): : 33 - 38
[46] An Audio-Video Feedback Platform for Radiation Therapy
Chiu, T.
Liu, H.
Brenner, M.
Dwyer, J.
Yang, M.
Jiang, S.
Gu, X.
MEDICAL PHYSICS, 2017, 44 (06)
[47] Audio-video people recognition system for an intelligent environment
Anzalone, Salvatore M.
Menegatti, Emanuele
Pagello, Enrico
Yoshikawa, Yuichiro
Ishiguro, Hiroshi
Chella, Antonio
4TH INTERNATIONAL CONFERENCE ON HUMAN SYSTEM INTERACTION (HSI 2011), 2011, : 237 - 244
[48] Audio-video switching system by infrared remote control
Ru, Guo-bao
Qin, Dan-qing
Zhang, Bing-de
Wuhan Daxue Xuebao/Journal of Wuhan University, 1999, 45 (03): : 371 - 373
[49] Frame estimation for restoring audio-video synchronization using parallelized quadratic frame interpolation
Aly, SG
Youssef, A
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, VOLS I-V, 2000, : 2333 - 2338
[50] PCM-MULTIPLEXED AUDIO IN A LARGE AUDIO-VIDEO ROUTING SWITCHER
BUTLER, RJ
SMPTE JOURNAL, 1976, 85 (11): : 875 - 877

← 1 2 3 4 5 →