Postfiltering Using Log-Magnitude Spectrum for Speech and Audio Coding

被引：4

作者：

Das, Sneha ^{[1
]}

Backstrom, Tom ^{[1
]}

机构：

[1] Aalto Univ, Dept Signal Proc & Acoust, Espoo, Finland

来源：

19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES | 2018年

关键词：

Quantization noise; Speech modelling; postfiltering; noise filling; Time-Frequency correlation;

D O I：

10.21437/Interspeech.2018-1027

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Advanced coding algorithms yield high quality signals with good coding efficiency within their target bit-rate ranges, but their performance suffer outside the target range. At lower bitrates, the degradation in performance is because the decoded signals are sparse, which gives a perceptually muffled and distorted characteristic to the signal. Standard codecs reduce such distortions by applying noise filling and post-filtering methods. In this paper, we propose a post-processing method based on modeling the inherent time-frequency correlation in the log-magnitude spectrum. The goal is to improve the perceptual SNR of the decoded signals and, to reduce the distortions caused by signal sparsity. Objective measures show an average improvement of 1.5 dB for input perceptual SNR in range 4 to 18 dB. The improvement is especially prominent in components which had been quantized to zero.

引用

页码：3543 / 3547

页数：5

共 50 条

[1] Postfiltering with Complex Spectral Correlations for Speech and Audio Coding
Das, Sneha
Backstrom, Tom
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3538 - 3542
[2] Enhancement by postfiltering for speech and audio coding in ad hoc sensor networks
Das, Sneha
Backstrom, Tom
JASA EXPRESS LETTERS, 2021, 1 (01):
[3] Log-magnitude modelling of auditory tuning curves
Lin, L
Ambikairajah, E
Holmes, WH
2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING, 2001, : 3293 - 3296
[4] Speech and audio coding using temporal masking
Gunawan, TS
Ambikairajah, E
Senn, D
SIGNAL PROCESSING FOR TELECOMMUNICATIONS AND MULTIMEDIA, 2005, 27 : 31 - 42
[5] Embedded coding using a mixed speech and audio coding paradigm
Ramprashad S.A.
International Journal of Speech Technology, 1999, 2 (4) : 359 - 372
[6] Discriminative features based on modified log magnitude spectrum for playback speech detection
Yang, Jichen
Xu, Longting
Ren, Bo
Ji, Yunyun
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2020, 2020 (01)
[7] Discriminative features based on modified log magnitude spectrum for playback speech detection
Jichen Yang
Longting Xu
Bo Ren
Yunyun Ji
EURASIP Journal on Audio, Speech, and Music Processing, 2020
[8] 16 KBIT/S ADAPTIVE PREDICTIVE CODING OF SPEECH WITH ADAPTIVE POSTFILTERING
ZARKADIS, DJ
EVANS, BG
ELECTRONICS LETTERS, 1987, 23 (07) : 358 - 360
[9] Frequency domain postfiltering for multiband excited linear predictive coding of speech
Univ of Hong Kong, Kowloon, Hong Kong
Electron Lett, 12 (1061-1063):
[10] Technologies for Speech and Audio Coding
Moriya, Takehiro
ISCE: 2009 IEEE 13TH INTERNATIONAL SYMPOSIUM ON CONSUMER ELECTRONICS, VOLS 1 AND 2, 2009, : 20 - 21

← 1 2 3 4 5 →