Low Bit-Rate Speech Coding Through Quantization of Mel-Frequency Cepstral Coefficients

被引:28
|
作者
Boucheron, Laura E. [1 ]
De Leon, Phillip L. [1 ]
Sandoval, Steven [1 ]
机构
[1] New Mexico State Univ, Klipsch Sch Elect & Comp Engn, Las Cruces, NM 88003 USA
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2012年 / 20卷 / 02期
关键词
Speech analysis; speech coding; OBJECTIVE QUALITY MEASURES; RECOGNITION;
D O I
10.1109/TASL.2011.2162407
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we propose a low bit-rate speech codec based on vector quantization (VQ) of the mel-frequency cepstral coefficients (MFCCs). We begin by showing that if a high-resolution mel-frequency cepstrum (MFC) is computed, good-quality speech reconstruction is possible from the MFCCs despite the lack of phase information. By evaluating the contribution toward speech quality that individual MFCCs make and applying appropriate quantization, our results show that the MFCC-based codec exceeds the state-of-the-art MELPe codec across the entire range of 600-2400 bps, when evaluated with the perceptual evaluation of speech quality (PESQ) (ITU-T recommendation P. 862). The main advantage of the proposed codec is in distributed speech recognition (DSR) since the MFCCs can be directly applied thus eliminating additional decode and feature extract stages; furthermore, the proposed codec better preserves the fidelity of MFCCs and better word accuracy rates as compared to CELP and MELPe codecs.
引用
收藏
页码:610 / 619
页数:10
相关论文
共 50 条
  • [31] MUSICAL INSTRUMENT IDENTIFICATION USING MULTISCALE MEL-FREQUENCY CEPSTRAL COEFFICIENTS
    Sturm, Bob L.
    Morvidone, Marcela
    Daudet, Laurent
    18TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO-2010), 2010, : 477 - 481
  • [32] SIGNAL MODELS FOR LOW BIT-RATE CODING OF SPEECH
    FLANAGAN, JL
    ISHIZAKA, K
    SHIPLEY, KL
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1980, 68 (03): : 780 - 791
  • [33] Techniques of very low bit-rate speech coding
    Cui, HJ
    Tang, K
    Zhao, M
    Zhang, X
    CHINESE JOURNAL OF ELECTRONICS, 2004, 13 (01): : 63 - 65
  • [34] Joint Quantization Strategies for Low Bit-Rate Sinusoidal Coding
    Unver, Emre
    Villette, Stephane
    Kondoz, Ahmet
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2571 - 2574
  • [35] Speech Reconstruction from Mel-frequency Cepstral Coefficients via l1-norm Minimization
    Min, Gang
    Zhang, Xiongwei
    Yang, Jibin
    Zou, Xia
    2015 IEEE 17TH INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2015,
  • [36] Algorithm for speech emotion recognition classification based on Mel-frequency Cepstral coefficients and broad learning system
    Zhiyou Yang
    Ying Huang
    Evolutionary Intelligence, 2022, 15 : 2485 - 2494
  • [37] Algorithm for speech emotion recognition classification based on Mel-frequency Cepstral coefficients and broad learning system
    Yang, Zhiyou
    Huang, Ying
    EVOLUTIONARY INTELLIGENCE, 2022, 15 (04) : 2485 - 2494
  • [38] Vocal Fold Pathology Assessment Using Mel-Frequency Cepstral Coefficients and Linear Predictive Cepstral Coefficients Features
    Saldanha, Jennifer C.
    Ananthakrishna, T.
    Pinto, Rohan
    JOURNAL OF MEDICAL IMAGING AND HEALTH INFORMATICS, 2014, 4 (02) : 168 - 173
  • [39] Extracting Mel-Frequency and Bark-Frequency Cepstral Coefficients from Encrypted Signals
    Thaine, Patricia
    Penn, Gerald
    INTERSPEECH 2019, 2019, : 3715 - 3719
  • [40] Hidden Markov Model Neurons Classification based on Mel-frequency Cepstral Coefficients
    Haggag, Sherif
    Mohamed, Shady
    Haggag, Hussein
    Nahavandi, Saeid
    PROCEEDINGS OF THE 2014 9TH INTERNATIONAL CONFERENCE ON SYSTEM OF SYSTEMS ENGINEERING (SOSE 2014), 2014, : 166 - 170