Automatic generation of the complete vocal tract shape from the sequence of phonemes to be articulated

被引：6

作者：

Ribeiro, Vinicius ^{[1
]}

Isaieva, Karyna ^{[2
]}

Leclere, Justine ^{[2
,3
]}

Vuissoz, Pierre-Andre ^{[2
]}

Laprie, Yves ^{[1
]}

机构：

[1] Univ Lorraine, CNRS, Inria, LORIA, F-54000 Nancy, France

[2] Univ Lorraine, INSERM, U1254, IADI, F-54000 Nancy, France

[3] Hop Maison Blanche, Serv Medecine Bucco dentaire, F-51100 Reims, France

来源：

SPEECH COMMUNICATION | 2022年 / 141卷

关键词：

Phonetic-to-articulatory; Speech production; Vocal tract shape; MRI; SEGMENTATION;

D O I：

10.1016/j.specom.2022.04.004

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Articulatory speech synthesis requires generating realistic vocal tract shapes from the sequence of phonemes to be articulated. This work proposes the first model trained from rt-MRI films to automatically predict all of the vocal tract articulators' contours. The data are the contours tracked in the rt-MRI database recorded for one speaker. Those contours were exploited to train an encoder-decoder network to map the sequence of phonemes and their durations to the exact gestures performed by the speaker. Different from other works, all the individual articulator contours are predicted separately, allowing the investigation of their interactions. We measure four tract variables closely coupled with critical articulators and observe their variations over time. The test demonstrates that our model can produce high-quality shapes of the complete vocal tract with a good correlation between the predicted and the target variables observed in rt-MRI films, even though the tract variables are not included in the optimization procedure.

引用

页码：1 / 13

页数：13

共 50 条

[21] A method for estimating vocal-tract shape from a target speech spectrum
Kaburagi, Tokihiko
ACOUSTICAL SCIENCE AND TECHNOLOGY, 2015, 36 (05) : 428 - 437
[22] Automatic FDP/FAP generation from an image sequence
Kim, JW
Song, M
Kim, IJ
Kwon, YM
Kim, HG
Ahn, SC
ISCAS 2000: IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS - PROCEEDINGS, VOL I: EMERGING TECHNOLOGIES FOR THE 21ST CENTURY, 2000, : 40 - 43
[23] Articulated deformable structure approach to human motion segmentation and shape recovery from an image sequence
Zhang, Peter Boyi
Hung, Yeung Sam
IET COMPUTER VISION, 2019, 13 (03) : 267 - 276
[24] Weak biases emerging from vocal tract anatomy shape the repeated transmission of vowels
Dan Dediu
Rick Janssen
Scott R. Moisik
Nature Human Behaviour, 2019, 3 : 1107 - 1115
[25] Weak biases emerging from vocal tract anatomy shape the repeated transmission of vowels
Dediu, Dan
Janssen, Rick
Moisik, Scott R.
NATURE HUMAN BEHAVIOUR, 2019, 3 (10) : 1107 - 1115
[26] Automatic generation of conformance tests from Message Sequence Charts
Baker, P
Bristow, P
Jervis, C
King, D
Mitchell, B
TELECOMMUNICATIONS AND BEYOND: THE BROADER APPLICABILITY OF SDL AND MSC, 2003, 2599 : 170 - 198
[27] Automatic Generation of Sequence Diagram from Use Case Specification
Thakur, Jitendra Singh
Gupta, Atul
PROCEEDINGS OF THE 7TH INDIA SOFTWARE ENGINEERING CONFERENCE 2014, ISEC '14, 2014,
[28] Automatic test case generation from UML sequence diagrams
Sarma, Monalisa
Kundu, Debasish
Mall, Rajib
ADCOM 2007: PROCEEDINGS OF THE 15TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING AND COMMUNICATIONS, 2007, : 60 - +
[29] Automatic tongue surface extraction from three-dimensional ultrasound vocal tract images
Karthik, Enamundram M. V. Naga
Karimi, Elham
Lulich, Steven M.
Laporte, Catherine
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2020, 147 (03): : 1623 - 1633
[30] Automatic Test Sequence Generation and Functional Coverage Measurement From UML Sequence Diagrams
Ekici, Nazim Umut
Tuglular, Tugkan
INTERNATIONAL JOURNAL OF INFORMATION SYSTEM MODELING AND DESIGN, 2023, 14 (01)

← 1 2 3 4 5 →