Low Bit-Rate Speech Coding Through Quantization of Mel-Frequency Cepstral Coefficients

被引：28

作者：

Boucheron, Laura E. ^{[1
]}

De Leon, Phillip L. ^{[1
]}

Sandoval, Steven ^{[1
]}

机构：

[1] New Mexico State Univ, Klipsch Sch Elect & Comp Engn, Las Cruces, NM 88003 USA

来源：

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2012年 / 20卷 / 02期

关键词：

Speech analysis; speech coding; OBJECTIVE QUALITY MEASURES; RECOGNITION;

D O I：

10.1109/TASL.2011.2162407

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this paper, we propose a low bit-rate speech codec based on vector quantization (VQ) of the mel-frequency cepstral coefficients (MFCCs). We begin by showing that if a high-resolution mel-frequency cepstrum (MFC) is computed, good-quality speech reconstruction is possible from the MFCCs despite the lack of phase information. By evaluating the contribution toward speech quality that individual MFCCs make and applying appropriate quantization, our results show that the MFCC-based codec exceeds the state-of-the-art MELPe codec across the entire range of 600-2400 bps, when evaluated with the perceptual evaluation of speech quality (PESQ) (ITU-T recommendation P. 862). The main advantage of the proposed codec is in distributed speech recognition (DSR) since the MFCCs can be directly applied thus eliminating additional decode and feature extract stages; furthermore, the proposed codec better preserves the fidelity of MFCCs and better word accuracy rates as compared to CELP and MELPe codecs.

引用

页码：610 / 619

页数：10

共 50 条

[31] MUSICAL INSTRUMENT IDENTIFICATION USING MULTISCALE MEL-FREQUENCY CEPSTRAL COEFFICIENTS
Sturm, Bob L.
Morvidone, Marcela
Daudet, Laurent
18TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO-2010), 2010, : 477 - 481
[32] SIGNAL MODELS FOR LOW BIT-RATE CODING OF SPEECH
FLANAGAN, JL
ISHIZAKA, K
SHIPLEY, KL
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1980, 68 (03): : 780 - 791
[33] Techniques of very low bit-rate speech coding
Cui, HJ
Tang, K
Zhao, M
Zhang, X
CHINESE JOURNAL OF ELECTRONICS, 2004, 13 (01): : 63 - 65
[34] Joint Quantization Strategies for Low Bit-Rate Sinusoidal Coding
Unver, Emre
Villette, Stephane
Kondoz, Ahmet
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2571 - 2574
[35] Speech Reconstruction from Mel-frequency Cepstral Coefficients via l1-norm Minimization
Min, Gang
Zhang, Xiongwei
Yang, Jibin
Zou, Xia
2015 IEEE 17TH INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2015,
[36] Algorithm for speech emotion recognition classification based on Mel-frequency Cepstral coefficients and broad learning system
Zhiyou Yang
Ying Huang
Evolutionary Intelligence, 2022, 15 : 2485 - 2494
[37] Algorithm for speech emotion recognition classification based on Mel-frequency Cepstral coefficients and broad learning system
Yang, Zhiyou
Huang, Ying
EVOLUTIONARY INTELLIGENCE, 2022, 15 (04) : 2485 - 2494
[38] Vocal Fold Pathology Assessment Using Mel-Frequency Cepstral Coefficients and Linear Predictive Cepstral Coefficients Features
Saldanha, Jennifer C.
Ananthakrishna, T.
Pinto, Rohan
JOURNAL OF MEDICAL IMAGING AND HEALTH INFORMATICS, 2014, 4 (02) : 168 - 173
[39] Extracting Mel-Frequency and Bark-Frequency Cepstral Coefficients from Encrypted Signals
Thaine, Patricia
Penn, Gerald
INTERSPEECH 2019, 2019, : 3715 - 3719
[40] Hidden Markov Model Neurons Classification based on Mel-frequency Cepstral Coefficients
Haggag, Sherif
Mohamed, Shady
Haggag, Hussein
Nahavandi, Saeid
PROCEEDINGS OF THE 2014 9TH INTERNATIONAL CONFERENCE ON SYSTEM OF SYSTEMS ENGINEERING (SOSE 2014), 2014, : 166 - 170

← 1 2 3 4 5 →