Waveform quantization of speech using Gaussian mixture models

被引:0
|
作者
Samuelsson, J [1 ]
机构
[1] Royal Inst Technol, KTH, Dept Signals Sensors & Syst, Stockholm, Sweden
关键词
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Waveform quantization of speech using Gaussian mixture models (GMMs) is proposed. GMMs are trained directly on the speech waveform, and high dimensional vector quantizers (VQs) that efficiently exploit the redundancy are constructed based on the GMM parameters. Two types of GMMs are studied. The complexity of the scheme is independent of the rate, and the rate can be changed without retraining the VQ. A shape-gain structure improves performance and robustness. Pre- and post-processing using spectral amplitude warping further improves perceptual quality. A 32-dimensional VQ operating at 2 bits/sample reproduces speech sampled at 8 kHz with a PESQ score of 4.2.
引用
收藏
页码:165 / 168
页数:4
相关论文
共 50 条
  • [1] Stochastic modeling and quantization of harmonic phases in speech using wrapped gaussian mixture models
    Agiomyrgiannakis, Yannis
    Stylianou, Yannis
    [J]. 2007 IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol IV, Pts 1-3, 2007, : 1121 - 1124
  • [2] Speech spectrum quantization using gaussian mixture models and multi-dimensional companding.
    Subramaniam, AD
    Gardner, WR
    Rao, BD
    [J]. 2002 IEEE SPEECH CODING WORKSHOP PROCEEDINGS: A PARADIGM SHIFT TOWARD NEW CODING FUNCTIONS FOR THE BROADBAND AGE, 2002, : 5 - 7
  • [3] Speech Enhancement Using Gaussian Scale Mixture Models
    Hao, Jiucang
    Lee, Te-Won
    Sejnowski, Terrence J.
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (06): : 1127 - 1136
  • [4] Emotional speech classification using Gaussian mixture models
    Ververidis, D
    Kotropoulos, C
    [J]. 2005 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), VOLS 1-6, CONFERENCE PROCEEDINGS, 2005, : 2871 - 2874
  • [5] Articulatory Controllable Speech Modification based on Gaussian Mixture Models with Direct Waveform Modification using Spectrum Differential
    Tobing, Patrick Lumban
    Kobayashi, Kazuhiro
    Toda, Tomoki
    Neubig, Graham
    Sakti, Sakriani
    Nakamura, Satoshi
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3350 - 3354
  • [6] Emotion Recognition from Speech using Gaussian Mixture Model and Vector Quantization
    Agrawal, Surabhi
    Dongaonkar, Shabda
    [J]. 2015 4TH INTERNATIONAL CONFERENCE ON RELIABILITY, INFOCOM TECHNOLOGIES AND OPTIMIZATION (ICRITO) (TRENDS AND FUTURE DIRECTIONS), 2015,
  • [7] Vector quantization based on Gaussian mixture models
    Hedelin, P
    Skoglund, J
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2000, 8 (04): : 385 - 401
  • [8] On Entropy-Constrained Vector Quantization using Gaussian Mixture Models
    Zhao, David Y.
    Samuelsson, Jonas
    Nilsson, Mattias
    [J]. IEEE TRANSACTIONS ON COMMUNICATIONS, 2008, 56 (12) : 2094 - 2104
  • [9] Age Approximation from Speech using Gaussian Mixture Models
    Mittal, Tanushri
    Barthwal, Anurag
    Koolagudi, Shashidhar G.
    [J]. 2013 SECOND INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING, NETWORKING AND SECURITY (ADCONS 2013), 2013, : 74 - 78
  • [10] Recognition of Emotions in German Speech Using Gaussian Mixture Models
    Vondra, Martin
    Vich, Robert
    [J]. MULTIMODAL SIGNAL: COGNITIVE AND ALGORITHMIC ISSUES, 2009, 5398 : 256 - 263