Synthesizing Audio from Tongue Motion During Speech Using Tagged MRI Via Transformer

被引：2

作者：

Liu, Xiaofeng ^{[1
,2
]}

Xing, Fangxu ^{[1
,2
]}

Prince, Jerry L. ^{[3
]}

Stone, Maureen ^{[4
]}

El Fakhri, Georges ^{[1
,2
]}

Woo, Jonghye ^{[1
,2
]}

机构：

[1] Massachusetts Gen Hosp, Gordon Ctr Med Imaging, Boston, MA 02114 USA

[2] Harvard Med Sch, Boston, MA 02114 USA

[3] Johns Hopkins Univ, Dept Elect & Comp Engn, Baltimore, MD 21218 USA

[4] Univ Maryland Sch Dent, Dept Neural & Pain Sci, Baltimore, MD 21201 USA

来源：

MEDICAL IMAGING 2023 | 2023年 / 12464卷

关键词：

Motion Fields; Transformer; Audio Synthesis; MRI;

D O I：

10.1117/12.2653345

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Investigating the relationship between internal tissue point motion of the tongue and oropharyngeal muscle deformation measured from tagged MRI and intelligible speech can aid in advancing speech motor control theories and developing novel treatment methods for speech related-disorders. However, elucidating the relationship between these two sources of information is challenging, due in part to the disparity in data structure between spatiotemporal motion fields (i.e., 4D motion fields) and one-dimensional audio waveforms. In this work, we present an efficient encoder-decoder translation network for exploring the predictive information inherent in 4D motion fields via 2D spectrograms as a surrogate of the audio data. Specifically, our encoder is based on 3D convolutional spatial modeling and transformer-based temporal modeling. The extracted features are processed by an asymmetric 2D convolution decoder to generate spectrograms that correspond to 4D motion fields. Furthermore, we incorporate a generative adversarial training approach into our framework to further improve synthesis quality on our generated spectrograms. We experiment on 63 paired motion field sequences and speech waveforms, demonstrating that our framework enables the generation of clear audio waveforms from a sequence of motion fields. Thus, our framework has the potential to improve our understanding of the relationship between these two modalities and inform the development of treatments for speech disorders.

引用

页数：5

共 50 条

[1] Speech Audio Synthesis from Tagged MRI and Non-negative Matrix Factorization via Plastic Transformer
Liu, Xiaofeng
Xing, Fangxu
Stone, Maureen
Zhuo, Jiachen
Fels, Sidney
Prince, Jerry L.
El Fakhri, Georges
Woo, Jonghye
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT VII, 2023, 14226 : 435 - 445
[2] Intermittently tagged real-time MRI reveals internal tongue motion during speech production
Chen, Weiyi
Byrd, Dani
Narayanan, Shrikanth
Nayak, Krishna S.
MAGNETIC RESONANCE IN MEDICINE, 2019, 82 (02) : 600 - 613
[3] Single syllable tongue motion analysis using tagged cine MRI
Unay, Devrim
Ozturk, Cengizhan
Stone, Maureen
COMPUTER METHODS IN BIOMECHANICS AND BIOMEDICAL ENGINEERING, 2014, 17 (08) : 853 - 864
[4] Modeling the motion of the internal tongue from tagged cine-MRI images
Stone, M
Davis, EP
Douglas, AS
NessAiver, M
Gullapalli, R
Levine, WS
Lundberg, A
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2001, 109 (06): : 2974 - 2982
[5] Measuring tongue motion from tagged cine-MRI using harmonic phase (HARP) processing
Parthasarathy, Vijay
Prince, Jerry L.
Stone, Maureen
Murano, Erni Z.
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2007, 121 (01): : 491 - 504
[6] Strain Map of the Tongue in Normal and ALS Speech Patterns from Tagged and Diffusion MRI
Xing, Fangxu
Prince, Jerry L.
Stone, Maureen
Reese, Timothy G.
Atassi, Nazem
Wedeen, Van J.
El Fakhri, Georges
Woo, Jonghye
MEDICAL IMAGING 2018: IMAGE PROCESSING, 2018, 10574
[7] Speech Map: a statistical multimodal atlas of 4D tongue motion during speech from tagged and cine MR images
Woo, Jonghye
Xing, Fangxu
Stone, Maureen
Green, Jordan
Reese, Timothy G.
Brady, Thomas J.
Wedeen, Van J.
Prince, Jerry L.
El Fakhri, Georges
COMPUTER METHODS IN BIOMECHANICS AND BIOMEDICAL ENGINEERING-IMAGING AND VISUALIZATION, 2019, 7 (04): : 361 - 373
[8] Speech Motion Anomaly Detection via Cross-Modal Translation of 4D Motion Fields from Tagged MRI
Liu, Xiaofeng
Xing, Fangxu
Zhuo, Jiachen
Stone, Maureen
Prince, Jerry L.
El Fakhri, Georges
Woo, Jonghye
MEDICAL IMAGING 2024: IMAGE PROCESSING, 2024, 12926
[9] ULTRASONIC VISUALIZATION OF TONGUE MOTION DURING SPEECH
SONIES, BC
SHAWKER, TH
HALL, TE
GERBER, LH
LEIGHTON, SB
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1981, 70 (03): : 683 - 686
[10] Audio Spectrogram Transformer for Synthetic Speech Detection via Speech Formant Analysis
Cuccovillo, Luca
Gerhardt, Milica
Aichroth, Patrick
2023 IEEE INTERNATIONAL WORKSHOP ON INFORMATION FORENSICS AND SECURITY, WIFS, 2023,

← 1 2 3 4 5 →