Audio-video collaborative JND estimation model for multimedia applications

被引:0
|
作者
Sheng, Ning [1 ]
Yin, Haibing [1 ,2 ]
Wang, Hongkui [1 ,2 ]
Mo, Longbin [1 ]
Liu, Yichen [1 ]
Huang, Xiaofeng [1 ]
Lin, Jucai [3 ]
Tang, Xianghong [1 ]
机构
[1] Hangzhou Dianzi Univ, Hangzhou 310000, Zhejiang, Peoples R China
[2] Hangzhou Dianzi Univ, Lishui Res Inst, Lishui 323000, Zhejiang, Peoples R China
[3] Zhejiang Dahua Technol Co Ltd, Hangzhou 310000, Zhejiang, Peoples R China
关键词
Just noticeable distortion (JND); Human visual system (HVS); Multimodal perception; Visual attention; Audio features; NOTICEABLE DISTORTION; LOUDNESS;
D O I
10.1016/j.jvcir.2024.104254
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the rapid development of the Internet and multimedia technologies, multimedia applications integrating audio and video are becoming increasingly prevalent in both everyday life and professional environments. A critical challenge is to significantly enhance compression efficiency and bandwidth utilization while maintaining high-quality user experiences. To address this challenge, the Just Noticeable Distortion (JND) estimation model, which leverages the perceptual characteristics of the Human Visual System (HVS), is widely used in image and video coding for improved data compression. However, human visual perception is an integrative process that involves both visual and auditory stimuli. Therefore, this paper investigates the influence of audio signals on visual perception and presents a collaborative audio-video JND estimation model tailored for multimedia applications. Specifically, we characterize audio loudness, duration, and energy as temporal perceptual features, while assigning the audio saliency superimposed on the image plane as the spatial perceptual feature. An audio JND adjustment factor is then designed using a segmentation function. Finally, the proposed model combines the video-based JND model with the audio JND adjustment factor to form the audio-video collaborative JND estimation model. Compared with existing JND models, the model presented in this paper achieves the best subjective quality, with an average PSNR value of 26.97 dB. The experimental results confirm that audio significantly impacts human visual perception. The proposed audio- video collaborative JND model effectively enhances the accuracy of JND estimation for multimedia data, thereby improving compression efficiency and maintaining high-quality user experiences.
引用
下载
收藏
页数:12
相关论文
共 50 条
  • [1] An Audio-video Collaborative JND Estimation Model for Multimedia Data
    Sheng, Ning
    Yin, Haibing
    Wang, Hongkui
    Wang, Xia
    2024 DATA COMPRESSION CONFERENCE, DCC, 2024, : 581 - 581
  • [2] Audio-video synchronization management in embedded multimedia applications
    Rehman, Hamood-Ur
    Kim, Taehyun
    Avadhanam, Niranjan
    Subramanian, Sridharan
    COMPUTATIONAL IMAGING VI, 2008, 6814
  • [3] An audio-video front-end for multimedia applications
    Zotkin, D
    Duraiswami, R
    Davis, L
    Haritaoglu, I
    SMC 2000 CONFERENCE PROCEEDINGS: 2000 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN & CYBERNETICS, VOL 1-5, 2000, : 786 - 791
  • [4] Joint audio-video processing for multimedia
    Chen, T
    Rao, R
    PROCEEDINGS OF THE 1996 IEEE IECON - 22ND INTERNATIONAL CONFERENCE ON INDUSTRIAL ELECTRONICS, CONTROL, AND INSTRUMENTATION, VOLS 1-3, 1996, : 548 - 553
  • [5] COLLABORATIVE LEARNING TO GENERATE AUDIO-VIDEO JOINTLY
    Kurmi, Vinod K.
    Bajaj, Vipul
    Patro, Badri N.
    Venkatesh, K. S.
    Namboodiri, Vinay P.
    Jyothi, Preethi
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 4180 - 4184
  • [6] Audio-Video steganography
    Kakde, Yugeshwari
    Gonnade, Priyanka
    Dahiwale, Prashant
    2015 INTERNATIONAL CONFERENCE ON INNOVATIONS IN INFORMATION, EMBEDDED AND COMMUNICATION SYSTEMS (ICIIECS), 2015,
  • [7] Audio-video biometric recognition for non-collaborative access granting
    Micheloni, Christian
    Canazza, Sergio
    Foresti, Gian Luca
    JOURNAL OF VISUAL LANGUAGES AND COMPUTING, 2009, 20 (06): : 353 - 367
  • [8] Transcribing audio-video archives
    Barras, C
    Allauzen, A
    Lamel, L
    Gauvain, JL
    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 13 - 16
  • [9] AUDIO-VIDEO TUTORIAL PROGRAM
    SYROCKI, J
    THOMAS, CS
    FAIRCHILD, GC
    AMERICAN BIOLOGY TEACHER, 1969, 31 (02): : 91 - +
  • [10] RATE-COVERAGE ANALYSIS AND OPTIMIZATION FOR JOINT AUDIO-VIDEO MULTIMEDIA RETRIEVAL
    Ning, Guanghan
    Zhang, Zhi
    Ren, Xiaobo
    Wang, Haohong
    He, Zhihai
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 2911 - 2915