Perceptual Audio Object Coding Using Adaptive Subband Grouping with CNN and Residual Block

被引:1
|
作者
Wu, Yulin [1 ]
Hu, Ruimin [1 ]
Wang, Xiaochen [1 ]
机构
[1] Wuhan Univ, Natl Engn Res Ctr Multimedia Software, Sch Comp Sci, Hubei Key Lab Multimedia & Network Commun Engn, Wuhan, Peoples R China
关键词
spatial audio object coding (SAOC); perceptual coding; adaptive subband grouping; aliasing distortion;
D O I
10.1109/ICME55011.2023.00433
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Spatial audio content is becoming increasingly popular and is regarded as a set of object signals with associated metadata. The object-based content representation is independent of loudspeaker layouts and provides high spatial resolution when reproduced on more loudspeakers. The audio quality of the traditional spatial audio object coding (SAOC) method has severe aliasing distortion, which impairs the immersive listening experience. In this study, we reduce aliasing distortion by perceptual adaptive subband grouping strategy and use the convolutional neural network (CNN) and residual block to build the side information compressing model. Both objective and subjective experiments on benchmark datasets with different bitrates show that the proposed method achieves favorable performance against state-of-the-art methods.
引用
收藏
页码:2543 / 2548
页数:6
相关论文
共 50 条
  • [1] Subband audio coding with synthesis filters minimizing a perceptual distortion
    Gosse, K
    deSaintMartin, FM
    Durot, X
    Duhamel, P
    Rault, JB
    1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 347 - 350
  • [2] Adaptive subband partition encoding scheme for multiple audio objects using CNN and residual dense blocks mixture network
    Wu, Yulin
    Hu, Ruimin
    Wang, Xiaochen
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 247
  • [3] Subband coding and image compression using CNN
    Moreira-Tamayo, O
    De Gyvez, JP
    INTERNATIONAL JOURNAL OF CIRCUIT THEORY AND APPLICATIONS, 1999, 27 (01) : 135 - 151
  • [4] Subband coding and image compression using CNN
    Zarándy, Ákos
    International Journal of Circuit Theory and Applications, 27 (01): : 135 - 151
  • [5] APIC: Adaptive perceptual image coding based on subband decomposition with locally adaptive perceptual weighting
    Hontsch, I
    Karam, LJ
    INTERNATIONAL CONFERENCE ON IMAGE PROCESSING - PROCEEDINGS, VOL I, 1997, : 37 - 40
  • [6] A new subband perceptual audio coder using CELP
    van der Vrecken, O
    Hubaut, L
    Coulon, F
    PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 3661 - 3664
  • [7] PERCEPTUAL AUDIO CODING WITH ADAPTIVE NON-UNIFORM TIME/FREQUENCY TILINGS USING SUBBAND MERGING AND TIME DOMAIN ALIASING REDUCTION
    Werner, Nils
    Edler, Bernd
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 740 - 744
  • [8] SUBBAND CODING WITH ADAPTIVE PREDICTION FOR 56 KBITS/S AUDIO
    RICHARDSON, EB
    JAYANT, NS
    IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1986, 34 (04): : 691 - 696
  • [9] Variable subband analysis for high quality Spatial Audio Object Coding
    Koo, Kyungryeol
    Kim, Kwangki
    Se, Jeongil
    Kang, Kyeongok
    Hahn, Minsoo
    10TH INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION TECHNOLOGY, VOLS I-III: INNOVATIONS TOWARD FUTURE NETWORKS AND SERVICES, 2008, : 1205 - 1208
  • [10] Perceptual coding of audio signals using adaptive time-frequency transform
    Umapathy K.
    Krishnan S.
    EURASIP Journal on Audio, Speech, and Music Processing, 2007 (1)