Automatic generation of the complete vocal tract shape from the sequence of phonemes to be articulated

被引:6
|
作者
Ribeiro, Vinicius [1 ]
Isaieva, Karyna [2 ]
Leclere, Justine [2 ,3 ]
Vuissoz, Pierre-Andre [2 ]
Laprie, Yves [1 ]
机构
[1] Univ Lorraine, CNRS, Inria, LORIA, F-54000 Nancy, France
[2] Univ Lorraine, INSERM, U1254, IADI, F-54000 Nancy, France
[3] Hop Maison Blanche, Serv Medecine Bucco dentaire, F-51100 Reims, France
关键词
Phonetic-to-articulatory; Speech production; Vocal tract shape; MRI; SEGMENTATION;
D O I
10.1016/j.specom.2022.04.004
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Articulatory speech synthesis requires generating realistic vocal tract shapes from the sequence of phonemes to be articulated. This work proposes the first model trained from rt-MRI films to automatically predict all of the vocal tract articulators' contours. The data are the contours tracked in the rt-MRI database recorded for one speaker. Those contours were exploited to train an encoder-decoder network to map the sequence of phonemes and their durations to the exact gestures performed by the speaker. Different from other works, all the individual articulator contours are predicted separately, allowing the investigation of their interactions. We measure four tract variables closely coupled with critical articulators and observe their variations over time. The test demonstrates that our model can produce high-quality shapes of the complete vocal tract with a good correlation between the predicted and the target variables observed in rt-MRI films, even though the tract variables are not included in the optimization procedure.
引用
收藏
页码:1 / 13
页数:13
相关论文
共 50 条
  • [1] Towards the prediction of the vocal tract shape from the sequence of phonemes to be articulated
    Ribeiro, Vinicius
    Isaieva, Karyna
    Leclere, Justine
    Vuissoz, Pierre-Andre
    Laprie, Yves
    INTERSPEECH 2021, 2021, : 3325 - 3329
  • [2] Automatic Generation of Statistical Pose and Shape Models for Articulated Joints
    Chen, Xin
    Graham, Jim
    Hutchinson, Charles
    Muir, Lindsay
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2014, 33 (02) : 372 - 383
  • [3] AUTOMATIC-GENERATION OF VOICELESS EXCITATION IN A VOCAL CORD-VOCAL TRACT SPEECH SYNTHESIZER
    FLANAGAN, JL
    ISHIZAKA, K
    IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1976, 24 (02): : 163 - 170
  • [4] AUTOMATIC GENERATION OF VOICELESS EXCITATION IN A VOCAL-CORD VOCAL-TRACT SPEECH SYNTHESIZER
    FLANAGAN, JL
    ISHIZAKA, K
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1975, 57 : S50 - S50
  • [5] Towards Automatic Speech Identification from Vocal Tract Shape Dynamics in Real-time MRI
    Saha, Pramit
    Srungarapu, Praneeth
    Fels, Sidney
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1249 - 1253
  • [6] DETERMINATION OF VOCAL TRACT SHAPE FROM ACOUSTICAL MEASUREMENTS
    GOPINATH, B
    SONDHI, MM
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1969, 46 (1P1): : 110 - &
  • [7] DETERMINATION OF SHAPE OF HUMAN VOCAL TRACT FROM ACOUSTICAL MEASUREMENTS
    GOPINATH, B
    SONDHI, MM
    BELL SYSTEM TECHNICAL JOURNAL, 1970, 49 (06): : 1195 - +
  • [8] Generation of the vocal tract spectrum from the underlying articulatory mechanism
    Kaburagi, Tokihiko
    Kim, Jiji
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2007, 121 (01): : 456 - 468
  • [9] Generation of the vocal tract spectrum from the underlying articulatory mechanism
    Department of Acoustic Design, Faculty of Design, Kyushu University, 4-9-1 Shiobaru, Minami-ku, Fukuoka, 815-8540, Japan
    不详
    Journal of the Acoustical Society of America, 2007, 121 (01): : 456 - 468
  • [10] Automatic vocal tract landmark localization from midsagittal MRI data
    Eslami, Mohammad
    Neuschaefer-Rube, Christiane
    Serrurier, Antoine
    SCIENTIFIC REPORTS, 2020, 10 (01)