Audio-video collaborative JND estimation model for multimedia applications

被引:0
|
作者
Sheng, Ning [1 ]
Yin, Haibing [1 ,2 ]
Wang, Hongkui [1 ,2 ]
Mo, Longbin [1 ]
Liu, Yichen [1 ]
Huang, Xiaofeng [1 ]
Lin, Jucai [3 ]
Tang, Xianghong [1 ]
机构
[1] Hangzhou Dianzi Univ, Hangzhou 310000, Zhejiang, Peoples R China
[2] Hangzhou Dianzi Univ, Lishui Res Inst, Lishui 323000, Zhejiang, Peoples R China
[3] Zhejiang Dahua Technol Co Ltd, Hangzhou 310000, Zhejiang, Peoples R China
关键词
Just noticeable distortion (JND); Human visual system (HVS); Multimodal perception; Visual attention; Audio features; NOTICEABLE DISTORTION; LOUDNESS;
D O I
10.1016/j.jvcir.2024.104254
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the rapid development of the Internet and multimedia technologies, multimedia applications integrating audio and video are becoming increasingly prevalent in both everyday life and professional environments. A critical challenge is to significantly enhance compression efficiency and bandwidth utilization while maintaining high-quality user experiences. To address this challenge, the Just Noticeable Distortion (JND) estimation model, which leverages the perceptual characteristics of the Human Visual System (HVS), is widely used in image and video coding for improved data compression. However, human visual perception is an integrative process that involves both visual and auditory stimuli. Therefore, this paper investigates the influence of audio signals on visual perception and presents a collaborative audio-video JND estimation model tailored for multimedia applications. Specifically, we characterize audio loudness, duration, and energy as temporal perceptual features, while assigning the audio saliency superimposed on the image plane as the spatial perceptual feature. An audio JND adjustment factor is then designed using a segmentation function. Finally, the proposed model combines the video-based JND model with the audio JND adjustment factor to form the audio-video collaborative JND estimation model. Compared with existing JND models, the model presented in this paper achieves the best subjective quality, with an average PSNR value of 26.97 dB. The experimental results confirm that audio significantly impacts human visual perception. The proposed audio- video collaborative JND model effectively enhances the accuracy of JND estimation for multimedia data, thereby improving compression efficiency and maintaining high-quality user experiences.
引用
下载
收藏
页数:12
相关论文
共 50 条
  • [31] Performance Enhancement for Audio-Video Proxy Server
    Kanrar, Soumen
    Mandal, Niranjan Kumar
    PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON FRONTIERS OF INTELLIGENT COMPUTING: THEORY AND APPLICATIONS (FICTA) 2014, VOL 1, 2015, 327 : 605 - 613
  • [32] Audio-Video Analysis of Musical Expressive Intentions
    Visentini, Ingrid
    Roda, Antonio
    Canazza, Sergio
    Snidaro, Lauro
    IMAGE ANALYSIS AND PROCESSING - ICIAP 2011, PT II, 2011, 6979 (II): : 219 - +
  • [33] On Building Immersive Audio Applications Using Robust Adaptive Beamforming and Joint Audio-Video Source Localization
    J. A. Beracoechea
    S. Torres-Guijarro
    L. García
    F. J. Casajús-Quirós
    EURASIP Journal on Advances in Signal Processing, 2006
  • [34] Multimodal speaker identification with audio-video processing
    Yemez, Y
    Kanak, A
    Erzin, E
    Tekalp, AM
    2003 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL 3, PROCEEDINGS, 2003, : 5 - 8
  • [35] Audio-Video detection of the active speaker in meetings
    Madrigal, Francisco
    Lerasle, Frederic
    Pibre, Lionel
    Ferrane, Isabelle
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 2536 - 2543
  • [36] NO CAUSE FOR JUBILATION AT BERLIN AUDIO-VIDEO FAIR
    GOSCH, J
    ELECTRONICS, 1985, 58 (34): : 34 - &
  • [37] Unsupervised news video segmentation by combined audio-video analysis
    De Santo, M.
    Percannella, G.
    Sansone, C.
    Vento, M.
    MULTIMEDIA CONTENT REPRESENTATION, CLASSIFICATION AND SECURITY, 2006, 4105 : 273 - 281
  • [38] Parsing News video using integrated audio-video features
    Krishna, SK
    Subbarao, R
    Chaudhury, S
    Kumar, A
    PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PROCEEDINGS, 2005, 3776 : 538 - 543
  • [39] INTEL-IBMS AUDIO-VIDEO KERNEL
    DONOVAN, JW
    BYTE, 1991, 16 (13): : 177 - &
  • [40] AUDIO-VIDEO TECHNOLOGIES IN LEARNING SOCIAL PROBLEMS
    Pervova, Irina L.
    Kelasyev, Viacheslav N.
    6TH INTERNATIONAL CONFERENCE OF EDUCATION, RESEARCH AND INNOVATION (ICERI 2013), 2013, : 6948 - 6952