ON THE USE OF NEURAL NETWORKS IN ARTICULATORY SPEECH SYNTHESIS

被引:26
|
作者
RAHIM, MG
GOODYEAR, CC
KLEIJN, WB
SCHROETER, J
SONDHI, MM
机构
[1] UNIV LIVERPOOL, DEPT ELECT ENGN, LIVERPOOL L69 3BX, ENGLAND
[2] AT&T BELL LABS, ACOUST RES DEPT, MURRAY HILL, NJ 07974 USA
来源
关键词
D O I
10.1121/1.405559
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A long-standing problem in the analysis and synthesis of speech by articulatory description is the estimation of the vocal tract shape parameters from natural input speech. Methods to relate spectral parameters to articulatory positions are feasible if a sufficiently large amount of data is available. This, however, results in a high computational load and large memory requirements. Further, one needs to accommodate ambiguities in this mapping due to the nonuniqueness problem (i.e., several vocal tract shapes can result in identical spectral envelopes). This paper describes the use of artificial neural networks for acoustic to articulatory parameter mapping. Experimental results show that a single feed-forward neural net is unable to perform this mapping sufficiently well when trained on a large data set. An alternative procedure is proposed, based on an assembly of neural networks. Each network is designated to a specific region in the articulatory space, and performs a mapping from cepstral values into tract areas. The training of this assembly is executed in two stages: In the first stage, a codebook of suitably normalized articulatory parameters is used, and in the second stage, real speech data are used to further improve the mapping. During synthesis, neural networks are selected by dynamic programming using a criterion that ensures smoothly varying vocal tract shapes while maintaining a good spectral match. The method is able to accommodate nonuniqueness in acoustic-to-articulatory mapping and can be bootstrapped efficiently from natural speech. Results on the performance of this procedure compared to other mapping procedures, including codebook look-up and a single multilayered network, are presented.
引用
收藏
页码:1109 / 1121
页数:13
相关论文
共 50 条
  • [1] Articulatory Copy Synthesis Based on the Speech Synthesizer VocalTractLab and Convolutional Recurrent Neural Networks
    Gao, Yingming
    Birkholz, Peter
    Li, Ya
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 1845 - 1858
  • [2] Data driven articulatory synthesis with deep neural networks
    Aryal, Sandesh
    Gutierrez-Osuna, Ricardo
    [J]. COMPUTER SPEECH AND LANGUAGE, 2016, 36 : 260 - 273
  • [3] ARTICULATORY FEATURES FROM DEEP NEURAL NETWORKS AND THEIR ROLE IN SPEECH RECOGNITION
    Mitra, Vikramjit
    Sivaraman, Ganesh
    Nam, Hosung
    Espy-Wilson, Carol
    Saltzman, Elliot
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [4] Phone-based speech synthesis with neural network and articulatory control
    Lo, WK
    Ching, PC
    [J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 2227 - 2230
  • [5] Hybrid convolutional neural networks for articulatory and acoustic information based speech recognition
    Mitra, Vikramjit
    Sivaraman, Ganesh
    Nam, Hosung
    Espy-Wilson, Carol
    Saltzman, Elliot
    Tiede, Mark
    [J]. SPEECH COMMUNICATION, 2017, 89 : 103 - 112
  • [6] A method to extract articulatory parameters from the speech signal using Neural Networks
    Branco, A
    Tome, A
    Teixeira, A
    Vaz, F
    [J]. DSP 97: 1997 13TH INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING PROCEEDINGS, VOLS 1 AND 2: SPECIAL SESSIONS, 1997, : 583 - 586
  • [7] Centerline articulatory models of the velum and epiglottis for articulatory synthesis of speech
    Laprie, Yves
    Elie, Benjamin
    Tsukanova, Anastasiia
    Vuissoz, Pierre-Andre
    [J]. 2018 26TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2018, : 2110 - 2114
  • [8] ARTICULATORY FEATURES FOR EXPRESSIVE SPEECH SYNTHESIS
    Black, Alan W.
    Bunnell, H. Timothy
    Dou, Ying
    Muthukumar, Prasanna Kumar
    Metze, Florian
    Perry, Daniel
    Polzehl, Tim
    Prahallad, Kishore
    Steidl, Stefan
    Vaughn, Callie
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4005 - 4008
  • [9] On the Contribution of Articulatory Features to Speech Synthesis
    Matura, Martin
    Juzova, Marketa
    Matousek, Jindrich
    [J]. SPEECH AND COMPUTER (SPECOM 2018), 2018, 11096 : 398 - 407
  • [10] ARTICULATORY INVERSION AND SYNTHESIS: TOWARDS ARTICULATORY-BASED MODIFICATION OF SPEECH
    Aryal, Sandesh
    Gutierrez-Osuna, Ricardo
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7952 - 7956