Automatic generation of the complete vocal tract shape from the sequence of phonemes to be articulated

被引：6

作者：

Ribeiro, Vinicius ^{[1
]}

Isaieva, Karyna ^{[2
]}

Leclere, Justine ^{[2
,3
]}

Vuissoz, Pierre-Andre ^{[2
]}

Laprie, Yves ^{[1
]}

机构：

[1] Univ Lorraine, CNRS, Inria, LORIA, F-54000 Nancy, France

[2] Univ Lorraine, INSERM, U1254, IADI, F-54000 Nancy, France

[3] Hop Maison Blanche, Serv Medecine Bucco dentaire, F-51100 Reims, France

来源：

SPEECH COMMUNICATION | 2022年 / 141卷

关键词：

Phonetic-to-articulatory; Speech production; Vocal tract shape; MRI; SEGMENTATION;

D O I：

10.1016/j.specom.2022.04.004

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Articulatory speech synthesis requires generating realistic vocal tract shapes from the sequence of phonemes to be articulated. This work proposes the first model trained from rt-MRI films to automatically predict all of the vocal tract articulators' contours. The data are the contours tracked in the rt-MRI database recorded for one speaker. Those contours were exploited to train an encoder-decoder network to map the sequence of phonemes and their durations to the exact gestures performed by the speaker. Different from other works, all the individual articulator contours are predicted separately, allowing the investigation of their interactions. We measure four tract variables closely coupled with critical articulators and observe their variations over time. The test demonstrates that our model can produce high-quality shapes of the complete vocal tract with a good correlation between the predicted and the target variables observed in rt-MRI films, even though the tract variables are not included in the optimization procedure.

引用

页码：1 / 13

页数：13

共 50 条

[31] Using statistical deformable models to reconstruct vocal tract shape from magnetic resonance images
Vasconcelos, M. J. M.
Ventura, S. M. Rua
Freitas, D. R. S.
Tavares, J. M. R. S.
PROCEEDINGS OF THE INSTITUTION OF MECHANICAL ENGINEERS PART H-JOURNAL OF ENGINEERING IN MEDICINE, 2010, 224 (H10) : 1153 - 1163
[32] DETERMINATION OF THE VOCAL-TRACT SHAPE FROM THE FORMANTS BY ANALYSIS OF THE ARTICULATORY-TO-ACOUSTIC NONLINEARITIES
CHARPENTIER, F
SPEECH COMMUNICATION, 1984, 3 (04) : 291 - 308
[33] Automatic subassembly detection from a product model for disassembly sequence generation
Ong, NS
Wong, YC
INTERNATIONAL JOURNAL OF ADVANCED MANUFACTURING TECHNOLOGY, 1999, 15 (06): : 425 - 431
[34] Automatic code generation from unified modelling language sequence diagrams
Kundu, Debasish
Samanta, Debasis
Mall, Rajib
IET SOFTWARE, 2013, 7 (01) : 12 - 28
[35] Automatic Subassembly Detection from a Product Model for Disassembly Sequence Generation
N. S. Ong
Y. C. Wong
The International Journal of Advanced Manufacturing Technology, 1999, 15 : 425 - 431
[36] A semi-automatic method for extracting vocal tract movements from X-ray films
Jallon, Julie Fontecave
Berthommier, Frederic
SPEECH COMMUNICATION, 2009, 51 (02) : 97 - 115
[37] Silent Speech and Emotion Recognition from Vocal Tract Shape Dynamics in Real-Time MRI
Pandey, Laxmi
Arif, Ahmed Sabbir
SIGGRAPH '21: ACM SIGGRAPH 2021 POSTERS, 2021,
[38] Estimation of Vocal-Tract Shape from Speech Spectrum and Speech Resynthesis Based on a Generative Model
Kaburagi, Tokihiko
15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 422 - 426
[39] Automatic Generation of PROMELA Code from Sequence Diagram with Imbricate Combined Fragments
Amirat, Abdelkrim
Menasria, Ahcen
Oubelli, Mouna Ait
Younsi, Nadia
2012 SECOND INTERNATIONAL CONFERENCE ON INNOVATIVE COMPUTING TECHNOLOGY (INTECH), 2012, : 111 - 116
[40] Automatic generation of UML sequence diagrams from user stories in Scrum process
Elallaoui, Meryem
Nafil, Khalid
Touahni, Raja
2015 10TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS: THEORIES AND APPLICATIONS (SITA), 2015,

← 1 2 3 4 5 →