Automatic generation of the complete vocal tract shape from the sequence of phonemes to be articulated

被引:6
|
作者
Ribeiro, Vinicius [1 ]
Isaieva, Karyna [2 ]
Leclere, Justine [2 ,3 ]
Vuissoz, Pierre-Andre [2 ]
Laprie, Yves [1 ]
机构
[1] Univ Lorraine, CNRS, Inria, LORIA, F-54000 Nancy, France
[2] Univ Lorraine, INSERM, U1254, IADI, F-54000 Nancy, France
[3] Hop Maison Blanche, Serv Medecine Bucco dentaire, F-51100 Reims, France
关键词
Phonetic-to-articulatory; Speech production; Vocal tract shape; MRI; SEGMENTATION;
D O I
10.1016/j.specom.2022.04.004
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Articulatory speech synthesis requires generating realistic vocal tract shapes from the sequence of phonemes to be articulated. This work proposes the first model trained from rt-MRI films to automatically predict all of the vocal tract articulators' contours. The data are the contours tracked in the rt-MRI database recorded for one speaker. Those contours were exploited to train an encoder-decoder network to map the sequence of phonemes and their durations to the exact gestures performed by the speaker. Different from other works, all the individual articulator contours are predicted separately, allowing the investigation of their interactions. We measure four tract variables closely coupled with critical articulators and observe their variations over time. The test demonstrates that our model can produce high-quality shapes of the complete vocal tract with a good correlation between the predicted and the target variables observed in rt-MRI films, even though the tract variables are not included in the optimization procedure.
引用
收藏
页码:1 / 13
页数:13
相关论文
共 50 条
  • [21] A method for estimating vocal-tract shape from a target speech spectrum
    Kaburagi, Tokihiko
    ACOUSTICAL SCIENCE AND TECHNOLOGY, 2015, 36 (05) : 428 - 437
  • [22] Automatic FDP/FAP generation from an image sequence
    Kim, JW
    Song, M
    Kim, IJ
    Kwon, YM
    Kim, HG
    Ahn, SC
    ISCAS 2000: IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS - PROCEEDINGS, VOL I: EMERGING TECHNOLOGIES FOR THE 21ST CENTURY, 2000, : 40 - 43
  • [23] Articulated deformable structure approach to human motion segmentation and shape recovery from an image sequence
    Zhang, Peter Boyi
    Hung, Yeung Sam
    IET COMPUTER VISION, 2019, 13 (03) : 267 - 276
  • [24] Weak biases emerging from vocal tract anatomy shape the repeated transmission of vowels
    Dan Dediu
    Rick Janssen
    Scott R. Moisik
    Nature Human Behaviour, 2019, 3 : 1107 - 1115
  • [25] Weak biases emerging from vocal tract anatomy shape the repeated transmission of vowels
    Dediu, Dan
    Janssen, Rick
    Moisik, Scott R.
    NATURE HUMAN BEHAVIOUR, 2019, 3 (10) : 1107 - 1115
  • [26] Automatic generation of conformance tests from Message Sequence Charts
    Baker, P
    Bristow, P
    Jervis, C
    King, D
    Mitchell, B
    TELECOMMUNICATIONS AND BEYOND: THE BROADER APPLICABILITY OF SDL AND MSC, 2003, 2599 : 170 - 198
  • [27] Automatic Generation of Sequence Diagram from Use Case Specification
    Thakur, Jitendra Singh
    Gupta, Atul
    PROCEEDINGS OF THE 7TH INDIA SOFTWARE ENGINEERING CONFERENCE 2014, ISEC '14, 2014,
  • [28] Automatic test case generation from UML sequence diagrams
    Sarma, Monalisa
    Kundu, Debasish
    Mall, Rajib
    ADCOM 2007: PROCEEDINGS OF THE 15TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING AND COMMUNICATIONS, 2007, : 60 - +
  • [29] Automatic tongue surface extraction from three-dimensional ultrasound vocal tract images
    Karthik, Enamundram M. V. Naga
    Karimi, Elham
    Lulich, Steven M.
    Laporte, Catherine
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2020, 147 (03): : 1623 - 1633
  • [30] Automatic Test Sequence Generation and Functional Coverage Measurement From UML Sequence Diagrams
    Ekici, Nazim Umut
    Tuglular, Tugkan
    INTERNATIONAL JOURNAL OF INFORMATION SYSTEM MODELING AND DESIGN, 2023, 14 (01)