We present a complete system for image-based 3D vocal tract analysis ranging from MR image acquisition during phonation, semi-automatic image processing, quantitative modeling including model-based speech synthesis, to quantitative model evaluation by comparison between recorded and synthesized phoneme Sounds. For this purpose, six professionally trained speakers, age 22-34y, were examined using it standardized MRI protocol (1.5 T. T1w FLASH, ST 4mm. 23 slices, acq. time 21s). The volunteers performed a prolonged (>= 21s) emission of sounds of the German phonemic inventory. Simultaneous audio tape recording was obtained to control correct utterance. Scans were made in axial, coronal, and sagittal planes each. Computer-aided quantitative 3D evaluation included (i) automated registration of the phoneme-specific data acquired in different slice orientations, (ii) semi-automated segmentation of oropharyngeal structures, (iii) Computation of it curvilinear vocal tract midline in 3D by nonlinear PCA. (iv) computation of cross-sectional areas of the vocal tract perpendicular to this midline. For the vowels the extracted area functions were used to synthesize phoneme Sounds based on all articulatory-acoustic model. For quantitative analysis, recorded and synthesized phonemes were compared, where area functions extracted from 2D midsagittal slices were used as a reference. All vowels could be identified correctly based on the synthesized phoneme sounds. The comparison between synthesized and recorded vowel phonemes revealed that the quality of phoneme sound synthesis was improved for phonemes /a/, /o/, and /y/, if 3D instead of 2D data were used, as measured by the average relative frequency shift between recorded and synthesized vowel formants (p < 0.05, one-sided Wilcoxon rank sum test). In summary, the combination of fast MRI followed by subsequent 3D segmentation and analysis is a novel approach to examine human phonation in vivo. It unveils functional anatomical findings that may be essential for realistic modelling of the human vocal tract during speech production.
机构:
Niigata Univ, Brain Res Inst, Ctr Integrated Human Brain Sci, Niigata 9518585, Japan
Niigata Univ, Grad Sch Med & Dent Sci, Dept Regenerat & Transplant Med, Div Orthoped Surg, Niigata 9518585, JapanNiigata Univ, Brain Res Inst, Ctr Integrated Human Brain Sci, Niigata 9518585, Japan
Urakawa, Takaaki
Matsuzawa, Hitoshi
论文数: 0引用数: 0
h-index: 0
机构:
Niigata Univ, Brain Res Inst, Ctr Integrated Human Brain Sci, Niigata 9518585, JapanNiigata Univ, Brain Res Inst, Ctr Integrated Human Brain Sci, Niigata 9518585, Japan
Matsuzawa, Hitoshi
Suzuki, Yuji
论文数: 0引用数: 0
h-index: 0
机构:
Niigata Univ, Brain Res Inst, Ctr Integrated Human Brain Sci, Niigata 9518585, JapanNiigata Univ, Brain Res Inst, Ctr Integrated Human Brain Sci, Niigata 9518585, Japan
Suzuki, Yuji
Endo, Naoto
论文数: 0引用数: 0
h-index: 0
机构:
Niigata Univ, Grad Sch Med & Dent Sci, Dept Regenerat & Transplant Med, Div Orthoped Surg, Niigata 9518585, JapanNiigata Univ, Brain Res Inst, Ctr Integrated Human Brain Sci, Niigata 9518585, Japan
Endo, Naoto
Kwee, Ingrid L.
论文数: 0引用数: 0
h-index: 0
机构:
Univ Calif Davis, Dept Neurol, Davis, CA 95616 USANiigata Univ, Brain Res Inst, Ctr Integrated Human Brain Sci, Niigata 9518585, Japan
Kwee, Ingrid L.
Nakada, Tsutomu
论文数: 0引用数: 0
h-index: 0
机构:
Niigata Univ, Brain Res Inst, Ctr Integrated Human Brain Sci, Niigata 9518585, Japan
Univ Calif Davis, Dept Neurol, Davis, CA 95616 USANiigata Univ, Brain Res Inst, Ctr Integrated Human Brain Sci, Niigata 9518585, Japan