Human Vocal Tract Analysis by in Vivo 3D MRI during Phonation: A Complete System for Imaging, Quantitative Modeling, and Speech Synthesis

被引:0
|
作者
Wismueller, Axel [1 ,2 ]
Behrends, Johannes [1 ,2 ]
Hoole, Phil [3 ]
Leinsinger, Gerda L. [4 ]
Reiser, Maximilian F. [4 ]
Westesson, Per-Lennart [1 ,2 ]
机构
[1] Univ Rochester, Dept Imaging Sci, 601 Elmwood Ave,Box 648, Rochester, NY 14642 USA
[2] Univ Rochester, Dept Biomed Engn, Rochester, NY 14642 USA
[3] Univ Munich, Dept Phonet, D-80799 Munich, Germany
[4] Univ Munich, Dept Radiol, D-80336 Munich, Germany
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a complete system for image-based 3D vocal tract analysis ranging from MR image acquisition during phonation, semi-automatic image processing, quantitative modeling including model-based speech synthesis, to quantitative model evaluation by comparison between recorded and synthesized phoneme Sounds. For this purpose, six professionally trained speakers, age 22-34y, were examined using it standardized MRI protocol (1.5 T. T1w FLASH, ST 4mm. 23 slices, acq. time 21s). The volunteers performed a prolonged (>= 21s) emission of sounds of the German phonemic inventory. Simultaneous audio tape recording was obtained to control correct utterance. Scans were made in axial, coronal, and sagittal planes each. Computer-aided quantitative 3D evaluation included (i) automated registration of the phoneme-specific data acquired in different slice orientations, (ii) semi-automated segmentation of oropharyngeal structures, (iii) Computation of it curvilinear vocal tract midline in 3D by nonlinear PCA. (iv) computation of cross-sectional areas of the vocal tract perpendicular to this midline. For the vowels the extracted area functions were used to synthesize phoneme Sounds based on all articulatory-acoustic model. For quantitative analysis, recorded and synthesized phonemes were compared, where area functions extracted from 2D midsagittal slices were used as a reference. All vowels could be identified correctly based on the synthesized phoneme sounds. The comparison between synthesized and recorded vowel phonemes revealed that the quality of phoneme sound synthesis was improved for phonemes /a/, /o/, and /y/, if 3D instead of 2D data were used, as measured by the average relative frequency shift between recorded and synthesized vowel formants (p < 0.05, one-sided Wilcoxon rank sum test). In summary, the combination of fast MRI followed by subsequent 3D segmentation and analysis is a novel approach to examine human phonation in vivo. It unveils functional anatomical findings that may be essential for realistic modelling of the human vocal tract during speech production.
引用
收藏
页码:306 / 312
页数:7
相关论文
共 44 条
  • [1] Computer-aided segmentation and 3D analysis of in vivo MRI examinations of the human vocal tract during phonation
    Wismueller, Axel
    Behrends, Johannes
    Hoole, Phil
    Leinsinger, Gerda L.
    Meyer-Baese, Anke
    Reiser, Maximilian F.
    MEDICAL IMAGING 2008: PHYSIOLOGY, FUNCTION, AND STRUCTURE FROM MEDICAL IMAGES, 2008, 6916 : T9160 - T9160
  • [2] 3D dynamic MRI of the vocal tract during natural speech
    Lim, Yongwan
    Zhu, Yinghua
    Lingala, Sajan Goud
    Byrd, Dani
    Narayanan, Shrikanth
    Nayak, Krishna Shrinivas
    MAGNETIC RESONANCE IN MEDICINE, 2019, 81 (03) : 1511 - 1520
  • [3] ACCELERATED 3D MRI OF VOCAL TRACT SHAPING USING COMPRESSED SENSING AND PARALLEL IMAGING
    Kim, Yoon-Chul
    Narayanan, Shrikanth S.
    Nayak, Krishna S.
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 389 - 392
  • [4] Super-Resolved Dynamic 3D Reconstruction of the Vocal Tract during Natural Speech
    Isaieva, Karyna
    Odille, Freddy
    Laprie, Yves
    Drouot, Guillaume
    Felblinger, Jacques
    Vuissoz, Pierre-Andre
    JOURNAL OF IMAGING, 2023, 9 (10)
  • [5] MRI-based morphometric analysis of the human vocal tract during speech formation and implications for fossil hominin vocal abilities
    Zollikofer, Christoph P. E.
    Haenni, Serge
    Suter, Susanne K.
    De Leon, Marcia S. Ponce
    AMERICAN JOURNAL OF PHYSICAL ANTHROPOLOGY, 2011, 144 : 318 - 318
  • [6] High-Frame-Rate Full-Vocal-Tract 3D Dynamic Speech Imaging
    Fu, Maojing
    Barlaz, Marissa S.
    Holtrop, Joseph L.
    Perry, Jamie L.
    Kuehn, David P.
    Shosted, Ryan K.
    Liang, Zhi-Pei
    Sutton, Bradley P.
    MAGNETIC RESONANCE IN MEDICINE, 2017, 77 (04) : 1619 - 1629
  • [7] The avian vocal system: 3D reconstruction reveals upper vocal tract elongation during head motion
    Fournier, Morgane
    Olson, Rachel
    Van Wassenbergh, Sam
    Provini, Pauline
    JOURNAL OF EXPERIMENTAL BIOLOGY, 2024, 227 (23):
  • [8] Quantitative analysis of 3D coronary modeling in 3D rotational X-ray imaging
    Movassaghi, B
    Rasche, V
    Viergever, MA
    Niessen, W
    2002 IEEE NUCLEAR SCIENCE SYMPOSIUM, CONFERENCE RECORD, VOLS 1-3, 2003, : 878 - 880
  • [9] Towards a method of dynamic vocal tract shapes generation by combining static 3D and dynamic 2D MRI speech data
    Douros, Ioannis K.
    Tsukanova, Anastasiia
    Isaieva, Karyna
    Vuissoz, Pierre-Andre
    Laprie, Yves
    INTERSPEECH 2019, 2019, : 879 - 883
  • [10] 3D in silico modeling of the human respiratory system for inhaled drug delivery and imaging analysis
    Martonen, T. B.
    Schroeter, J. D.
    Fleming, J. S.
    JOURNAL OF PHARMACEUTICAL SCIENCES, 2007, 96 (03) : 603 - 617