Human Vocal Tract Analysis by in Vivo 3D MRI during Phonation: A Complete System for Imaging, Quantitative Modeling, and Speech Synthesis

被引:0
|
作者
Wismueller, Axel [1 ,2 ]
Behrends, Johannes [1 ,2 ]
Hoole, Phil [3 ]
Leinsinger, Gerda L. [4 ]
Reiser, Maximilian F. [4 ]
Westesson, Per-Lennart [1 ,2 ]
机构
[1] Univ Rochester, Dept Imaging Sci, 601 Elmwood Ave,Box 648, Rochester, NY 14642 USA
[2] Univ Rochester, Dept Biomed Engn, Rochester, NY 14642 USA
[3] Univ Munich, Dept Phonet, D-80799 Munich, Germany
[4] Univ Munich, Dept Radiol, D-80336 Munich, Germany
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a complete system for image-based 3D vocal tract analysis ranging from MR image acquisition during phonation, semi-automatic image processing, quantitative modeling including model-based speech synthesis, to quantitative model evaluation by comparison between recorded and synthesized phoneme Sounds. For this purpose, six professionally trained speakers, age 22-34y, were examined using it standardized MRI protocol (1.5 T. T1w FLASH, ST 4mm. 23 slices, acq. time 21s). The volunteers performed a prolonged (>= 21s) emission of sounds of the German phonemic inventory. Simultaneous audio tape recording was obtained to control correct utterance. Scans were made in axial, coronal, and sagittal planes each. Computer-aided quantitative 3D evaluation included (i) automated registration of the phoneme-specific data acquired in different slice orientations, (ii) semi-automated segmentation of oropharyngeal structures, (iii) Computation of it curvilinear vocal tract midline in 3D by nonlinear PCA. (iv) computation of cross-sectional areas of the vocal tract perpendicular to this midline. For the vowels the extracted area functions were used to synthesize phoneme Sounds based on all articulatory-acoustic model. For quantitative analysis, recorded and synthesized phonemes were compared, where area functions extracted from 2D midsagittal slices were used as a reference. All vowels could be identified correctly based on the synthesized phoneme sounds. The comparison between synthesized and recorded vowel phonemes revealed that the quality of phoneme sound synthesis was improved for phonemes /a/, /o/, and /y/, if 3D instead of 2D data were used, as measured by the average relative frequency shift between recorded and synthesized vowel formants (p < 0.05, one-sided Wilcoxon rank sum test). In summary, the combination of fast MRI followed by subsequent 3D segmentation and analysis is a novel approach to examine human phonation in vivo. It unveils functional anatomical findings that may be essential for realistic modelling of the human vocal tract during speech production.
引用
收藏
页码:306 / 312
页数:7
相关论文
共 44 条
  • [21] A finite element 3D model of in vivo human knee joint based on MRI for the tibiofemoral joint contact analysis
    Hao, Zhixiu
    Jim, Dewen
    Zhang, Yu
    Zhang, Jichuan
    DIGITAL HUMAN MODELING, 2007, 4561 : 616 - 622
  • [22] Does breast MRI background parenchymal enhancement indicate metabolic activity? Qualitative and 3D quantitative computer imaging analysis
    Mema, Eralda
    Mango, Victoria L.
    Guo, Xiaotao
    Karcich, Jenika
    Yeh, Randy
    Wynn, Ralph T.
    Zhao, Binsheng
    Ha, Richard S.
    JOURNAL OF MAGNETIC RESONANCE IMAGING, 2018, 47 (03) : 753 - 759
  • [23] High-precision scanning system for complete 3D goat udder and teat imaging, and analysis of morphological traits
    Marnet, Pierre-Guy
    Delattre, Laurent
    Delouard, Jean Michel
    Luginbuhl, Thibault
    Laluque, Thibaut
    Martin, Pierre
    Coquereau, Gaelle
    SMALL RUMINANT RESEARCH, 2024, 231
  • [24] High-precision scanning system for complete 3D cow body shape imaging and analysis of morphological traits
    Le Cozler, Y.
    Allain, C.
    Caillot, A.
    Delouard, J. M.
    Delattre, L.
    Luginbuhl, T.
    Faverdin, P.
    COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2019, 157 : 447 - 453
  • [25] Correlation between 3D ARFI and quantitative imaging metrics from SWEI and multi-parametric MRI in vivo in normal and cancerous prostate tissue
    Morris, D. Cody
    Chan, Derek Y.
    Palmeri, Mark L.
    Glass, Tyler J.
    McCormick, Matthew M.
    Tay, K. Jack
    Polascik, Thomas J.
    Gupta, Rajan T.
    Nightingale, Kathryn R.
    2018 IEEE INTERNATIONAL ULTRASONICS SYMPOSIUM (IUS), 2018,
  • [26] Effects of spacer orientations on the cake formation during membrane fouling: Quantitative analysis based on 3D OCT imaging
    Liu, Xin
    Li, Weiyi
    Chong, Tzyy Haul
    Fane, Anthony G.
    WATER RESEARCH, 2017, 110 : 1 - 14
  • [27] Preliminary study of CT in combination with MRI perfusion imaging to develop accurate 3D model in human foot and ankle system
    Wang, Lin
    Huw Crompton, Robin
    D'Août, Kristiaan
    Footwear Science, 2015, 7
  • [28] Real-time monitoring and quantitative analysis of 3D tumor spheroids using portable cellular imaging system
    Lim, Ji Heon
    Choi, Ji Wook
    Kim, Na Yeon
    Kang, Taewook
    Chung, Bong Geun
    BIOMEDICAL ENGINEERING LETTERS, 2025, : 549 - 561
  • [29] Phase-based fast 3D high-resolution quantitative T2 MRI in 7 T human brain imaging
    Amir Seginer
    Rita Schmidt
    Scientific Reports, 12
  • [30] Phase-based fast 3D high-resolution quantitative T2 MRI in 7 T human brain imaging
    Seginer, Amir
    Schmidt, Rita
    SCIENTIFIC REPORTS, 2022, 12 (01)