Audio-video collaborative JND estimation model for multimedia applications

被引:0
|
作者
Sheng, Ning [1 ]
Yin, Haibing [1 ,2 ]
Wang, Hongkui [1 ,2 ]
Mo, Longbin [1 ]
Liu, Yichen [1 ]
Huang, Xiaofeng [1 ]
Lin, Jucai [3 ]
Tang, Xianghong [1 ]
机构
[1] Hangzhou Dianzi Univ, Hangzhou 310000, Zhejiang, Peoples R China
[2] Hangzhou Dianzi Univ, Lishui Res Inst, Lishui 323000, Zhejiang, Peoples R China
[3] Zhejiang Dahua Technol Co Ltd, Hangzhou 310000, Zhejiang, Peoples R China
关键词
Just noticeable distortion (JND); Human visual system (HVS); Multimodal perception; Visual attention; Audio features; NOTICEABLE DISTORTION; LOUDNESS;
D O I
10.1016/j.jvcir.2024.104254
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the rapid development of the Internet and multimedia technologies, multimedia applications integrating audio and video are becoming increasingly prevalent in both everyday life and professional environments. A critical challenge is to significantly enhance compression efficiency and bandwidth utilization while maintaining high-quality user experiences. To address this challenge, the Just Noticeable Distortion (JND) estimation model, which leverages the perceptual characteristics of the Human Visual System (HVS), is widely used in image and video coding for improved data compression. However, human visual perception is an integrative process that involves both visual and auditory stimuli. Therefore, this paper investigates the influence of audio signals on visual perception and presents a collaborative audio-video JND estimation model tailored for multimedia applications. Specifically, we characterize audio loudness, duration, and energy as temporal perceptual features, while assigning the audio saliency superimposed on the image plane as the spatial perceptual feature. An audio JND adjustment factor is then designed using a segmentation function. Finally, the proposed model combines the video-based JND model with the audio JND adjustment factor to form the audio-video collaborative JND estimation model. Compared with existing JND models, the model presented in this paper achieves the best subjective quality, with an average PSNR value of 26.97 dB. The experimental results confirm that audio significantly impacts human visual perception. The proposed audio- video collaborative JND model effectively enhances the accuracy of JND estimation for multimedia data, thereby improving compression efficiency and maintaining high-quality user experiences.
引用
下载
收藏
页数:12
相关论文
共 50 条
  • [21] AntiQue audio-video in a digital age
    Van Horn, R
    PHI DELTA KAPPAN, 2002, 83 (05) : 347 - 348
  • [22] Speaker tracking audio-video system
    Cetnarowicz, Damian
    Dabrowski, Adam
    2016 SIGNAL PROCESSING: ALGORITHMS, ARCHITECTURES, ARRANGEMENTS, AND APPLICATIONS (SPA), 2016, : 230 - 233
  • [23] An audio-video content abstraction program interface (CAPI) for home network applications
    Eytchison, EB
    ICCE: 2001 INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS, DIGEST OF TECHNICAL PAPERS, 2001, : 310 - 311
  • [24] An Augmented Interface to Audio-Video Components
    Abate, Andrea F.
    Narducci, Fabio
    Ricciardi, Stefano
    PROCEEDINGS OF THE INTERNATIONAL WORKING CONFERENCE ON ADVANCED VISUAL INTERFACES, 2012, : 254 - 257
  • [25] Auto-summarization of audio-video presentations
    He, LW
    Sanocki, E
    Gupta, A
    Grudin, J
    ACM MULTIMEDIA 99, PROCEEDINGS, 1999, : 489 - 498
  • [26] Lossless Audio Hiding Method for Synchronous Audio-Video Coding
    Chen, Weiwei
    Li, Jin
    Gabbouj, Moncef
    Takala, Jarmo
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 2432 - 2435
  • [27] On building immersive audio applications using robust adaptive beamforming and joint audio-video source localization
    Beracoechea, J. A.
    Torres-Guijarro, S.
    Garcia, L.
    Casajus-Quiros, F. J.
    EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2006, 2006 (1) : 1 - 12
  • [28] On building immersive audio applications using robust adaptive beamforming and joint audio-video source localization
    Beracoechea, J.A.
    Torres-Guijarro, S.
    García, L.
    Casajús-Quirós, F.J.
    Eurasip Journal on Applied Signal Processing, 2006, 2006 : 1 - 12
  • [29] Kalman filters for audio-video source localization
    Gehrig, T
    Nickel, K
    Ekenel, HK
    Klee, U
    McDonough, J
    2005 WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2005, : 118 - 121
  • [30] Automated MPEG audio-video summarization and description
    Sugano, M
    Nakajima, Y
    Yanagihara, H
    2002 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL I, PROCEEDINGS, 2002, : 956 - 959