Postfiltering Using Log-Magnitude Spectrum for Speech and Audio Coding

被引:4
|
作者
Das, Sneha [1 ]
Backstrom, Tom [1 ]
机构
[1] Aalto Univ, Dept Signal Proc & Acoust, Espoo, Finland
关键词
Quantization noise; Speech modelling; postfiltering; noise filling; Time-Frequency correlation;
D O I
10.21437/Interspeech.2018-1027
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Advanced coding algorithms yield high quality signals with good coding efficiency within their target bit-rate ranges, but their performance suffer outside the target range. At lower bitrates, the degradation in performance is because the decoded signals are sparse, which gives a perceptually muffled and distorted characteristic to the signal. Standard codecs reduce such distortions by applying noise filling and post-filtering methods. In this paper, we propose a post-processing method based on modeling the inherent time-frequency correlation in the log-magnitude spectrum. The goal is to improve the perceptual SNR of the decoded signals and, to reduce the distortions caused by signal sparsity. Objective measures show an average improvement of 1.5 dB for input perceptual SNR in range 4 to 18 dB. The improvement is especially prominent in components which had been quantized to zero.
引用
收藏
页码:3543 / 3547
页数:5
相关论文
共 50 条
  • [41] Enhancing the magnitude spectrum of speech features for robust speech recognition
    Jeih-weih Hung
    Hao-teng Fan
    Wen-hsiang Tu
    EURASIP Journal on Advances in Signal Processing, 2012
  • [42] Enhancing the magnitude spectrum of speech features for robust speech recognition
    Hung, Jeih-weih
    Fan, Hao-teng
    Tu, Wen-hsiang
    EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2012,
  • [43] Formant estimation from speech signal using the magnitude spectrum modified with group delay spectrum
    Chowdhury, Husne Ara
    Rahman, Mohammad Shahidur
    ACOUSTICAL SCIENCE AND TECHNOLOGY, 2021, 42 (02) : 93 - 102
  • [44] Magnitude Spectrum Enhancement for Robust Speech Recognition
    Tu, Wen-hsiang
    Hung, Jeih-weih
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4586 - 4589
  • [45] Speech Magnitude Spectrum Reconstruction from MFCCs Using Deep Neural Network
    Jiang Wenbin
    Liu Peilin
    Wen Fei
    CHINESE JOURNAL OF ELECTRONICS, 2018, 27 (02) : 393 - 398
  • [46] Speech Magnitude Spectrum Reconstruction from MFCCs Using Deep Neural Network
    JIANG Wenbin
    LIU Peilin
    WEN Fei
    ChineseJournalofElectronics, 2018, 27 (02) : 393 - 398
  • [47] Lattice Vector Quantization Applied to Speech and Audio Coding
    Minjie Xie(ZTE USA Inc.
    ZTE Communications, 2012, 10 (02) : 25 - 33
  • [48] Anthropomorphic Coding of Speech and Audio: A Model Inversion Approach
    Christian Feldbauer
    Gernot Kubin
    W. Bastiaan Kleijn
    EURASIP Journal on Advances in Signal Processing, 2005
  • [49] Anthropomorphic coding of speech and audio: A model inversion approach
    Feldbauer, C
    Kubin, G
    Kleijn, WB
    EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2005, 2005 (09) : 1334 - 1349
  • [50] A ROBUST SPEECH/MUSIC DISCRIMINATOR FOR SWITCHED AUDIO CODING
    Fuchs, Guillaume
    2015 23RD EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2015, : 569 - 573