Speech coding with an analysis-by-synthesis sinusoidal model

被引:0
|
作者
Etemoglu, ÇÖ [1 ]
Cuperman, V [1 ]
Gersho, A [1 ]
机构
[1] Univ Calif Santa Barbara, Dept Elect & Comp Engn, Santa Barbara, CA 93106 USA
关键词
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We introduce a general and powerful approach to sinusoidal modeling of speech wherein a closed-loop Analysis-by-Synthesis (AbS) technique sequentially extracts the parameters for each sinusoidal component. Low bit-rate speech coding is achieved by efficiently constraining the allowed frequencies of sinusoidal components into sets of frequency intervals or bins. In conjunction with the closed-loop analysis, the constrained frequency regions allow us to efficiently vector quantize the frequency information in each frame. In voiced frames, two sets of frequency vectors are generated: one for harmonically related components and the other for non-harmonically related components of the voiced segment. In transition frames, a vector of nonuniformly spaced frequencies is selected from a frequency codebook using frequency bin vector quantization (FBVQ) to represent the frequency domain information. The effectiveness of the coding scheme is enhanced by exploiting the critical band concept of auditory perception in defining the frequency bins. In transition segments, the sinusoidal phases are modeled and coded. Subjective tests with a partially quantized model indicate that, for a target rate of 4 kbps, the coder quality exceeds that of the G.729 standard at 8 kbps.
引用
收藏
页码:1371 / 1374
页数:4
相关论文
共 50 条
  • [31] Speech analysis and synthesis with a refined adaptive sinusoidal representation
    Tabet, Youcef
    Boughazi, Mohamed
    Afifi, Saddek
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2018, 21 (03) : 581 - 588
  • [32] Glottal closure instant synchronous sinusoidal model for high quality speech analysis/synthesis
    Zolfaghari, Parham
    Nakatani, Tomohiro
    Irino, Toshio
    Kawahara, Hideki
    Itakura, Fumitada
    EUROSPEECH - Euro. Conf. Speech Commun. Technol., 1600, (2441-2444):
  • [33] Low bit-rate speech coding based on an improved sinusoidal model
    Ahmadi, S
    Spanias, AS
    SPEECH COMMUNICATION, 2001, 34 (04) : 369 - 390
  • [34] Analysis-by-synthesis dissolve detection
    Covell, M
    Ahmad, S
    2002 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL II, PROCEEDINGS, 2002, : 425 - 428
  • [35] Multi-prototype waveform coding using frame-by-frame Analysis-by-Synthesis
    Burnett, IS
    Pham, DH
    1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 1567 - 1570
  • [36] Analysis-by-Synthesis Texture Reconstruction
    Liefers, Florian
    Parys, Roman
    Schilling, Andreas
    SECOND JOINT 3DIM/3DPVT CONFERENCE: 3D IMAGING, MODELING, PROCESSING, VISUALIZATION & TRANSMISSION (3DIMPVT 2012), 2012, : 571 - 578
  • [37] ANALYSIS OF EMOTIONAL SPEECH USING AN ADAPTIVE SINUSOIDAL MODEL
    Kafentzis, George P.
    Yakoumaki, Theodora
    Mouchtaris, Athanasios
    Styhanou, Yannis
    2014 PROCEEDINGS OF THE 22ND EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2014, : 1492 - 1496
  • [38] A variable-rate multimodal speech coder with gain-matched analysis-by-synthesis
    Paksoy, E
    McCree, A
    Viswanathan, V
    1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 751 - 754
  • [39] ANALYSIS-BY-SYNTHESIS FEATURE ESTIMATION FOR ROBUST AUTOMATIC SPEECH RECOGNITION USING SPECTRAL MASKS
    Mandel, Michael I.
    Narayanan, Arun
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [40] Analysis-by-Synthesis in Prosody Research
    Hoffmann, Ruediger
    PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON SPEECH PROSODY, VOLS I AND II, 2012, : 1 - 6