Synthesizing Audio from Tongue Motion During Speech Using Tagged MRI Via Transformer

被引:2
|
作者
Liu, Xiaofeng [1 ,2 ]
Xing, Fangxu [1 ,2 ]
Prince, Jerry L. [3 ]
Stone, Maureen [4 ]
El Fakhri, Georges [1 ,2 ]
Woo, Jonghye [1 ,2 ]
机构
[1] Massachusetts Gen Hosp, Gordon Ctr Med Imaging, Boston, MA 02114 USA
[2] Harvard Med Sch, Boston, MA 02114 USA
[3] Johns Hopkins Univ, Dept Elect & Comp Engn, Baltimore, MD 21218 USA
[4] Univ Maryland Sch Dent, Dept Neural & Pain Sci, Baltimore, MD 21201 USA
来源
MEDICAL IMAGING 2023 | 2023年 / 12464卷
关键词
Motion Fields; Transformer; Audio Synthesis; MRI;
D O I
10.1117/12.2653345
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Investigating the relationship between internal tissue point motion of the tongue and oropharyngeal muscle deformation measured from tagged MRI and intelligible speech can aid in advancing speech motor control theories and developing novel treatment methods for speech related-disorders. However, elucidating the relationship between these two sources of information is challenging, due in part to the disparity in data structure between spatiotemporal motion fields (i.e., 4D motion fields) and one-dimensional audio waveforms. In this work, we present an efficient encoder-decoder translation network for exploring the predictive information inherent in 4D motion fields via 2D spectrograms as a surrogate of the audio data. Specifically, our encoder is based on 3D convolutional spatial modeling and transformer-based temporal modeling. The extracted features are processed by an asymmetric 2D convolution decoder to generate spectrograms that correspond to 4D motion fields. Furthermore, we incorporate a generative adversarial training approach into our framework to further improve synthesis quality on our generated spectrograms. We experiment on 63 paired motion field sequences and speech waveforms, demonstrating that our framework enables the generation of clear audio waveforms from a sequence of motion fields. Thus, our framework has the potential to improve our understanding of the relationship between these two modalities and inform the development of treatments for speech disorders.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] Speech Audio Synthesis from Tagged MRI and Non-negative Matrix Factorization via Plastic Transformer
    Liu, Xiaofeng
    Xing, Fangxu
    Stone, Maureen
    Zhuo, Jiachen
    Fels, Sidney
    Prince, Jerry L.
    El Fakhri, Georges
    Woo, Jonghye
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT VII, 2023, 14226 : 435 - 445
  • [2] Intermittently tagged real-time MRI reveals internal tongue motion during speech production
    Chen, Weiyi
    Byrd, Dani
    Narayanan, Shrikanth
    Nayak, Krishna S.
    MAGNETIC RESONANCE IN MEDICINE, 2019, 82 (02) : 600 - 613
  • [3] Single syllable tongue motion analysis using tagged cine MRI
    Unay, Devrim
    Ozturk, Cengizhan
    Stone, Maureen
    COMPUTER METHODS IN BIOMECHANICS AND BIOMEDICAL ENGINEERING, 2014, 17 (08) : 853 - 864
  • [4] Modeling the motion of the internal tongue from tagged cine-MRI images
    Stone, M
    Davis, EP
    Douglas, AS
    NessAiver, M
    Gullapalli, R
    Levine, WS
    Lundberg, A
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2001, 109 (06): : 2974 - 2982
  • [5] Measuring tongue motion from tagged cine-MRI using harmonic phase (HARP) processing
    Parthasarathy, Vijay
    Prince, Jerry L.
    Stone, Maureen
    Murano, Erni Z.
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2007, 121 (01): : 491 - 504
  • [6] Strain Map of the Tongue in Normal and ALS Speech Patterns from Tagged and Diffusion MRI
    Xing, Fangxu
    Prince, Jerry L.
    Stone, Maureen
    Reese, Timothy G.
    Atassi, Nazem
    Wedeen, Van J.
    El Fakhri, Georges
    Woo, Jonghye
    MEDICAL IMAGING 2018: IMAGE PROCESSING, 2018, 10574
  • [7] Speech Map: a statistical multimodal atlas of 4D tongue motion during speech from tagged and cine MR images
    Woo, Jonghye
    Xing, Fangxu
    Stone, Maureen
    Green, Jordan
    Reese, Timothy G.
    Brady, Thomas J.
    Wedeen, Van J.
    Prince, Jerry L.
    El Fakhri, Georges
    COMPUTER METHODS IN BIOMECHANICS AND BIOMEDICAL ENGINEERING-IMAGING AND VISUALIZATION, 2019, 7 (04): : 361 - 373
  • [8] Speech Motion Anomaly Detection via Cross-Modal Translation of 4D Motion Fields from Tagged MRI
    Liu, Xiaofeng
    Xing, Fangxu
    Zhuo, Jiachen
    Stone, Maureen
    Prince, Jerry L.
    El Fakhri, Georges
    Woo, Jonghye
    MEDICAL IMAGING 2024: IMAGE PROCESSING, 2024, 12926
  • [9] ULTRASONIC VISUALIZATION OF TONGUE MOTION DURING SPEECH
    SONIES, BC
    SHAWKER, TH
    HALL, TE
    GERBER, LH
    LEIGHTON, SB
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1981, 70 (03): : 683 - 686
  • [10] Audio Spectrogram Transformer for Synthetic Speech Detection via Speech Formant Analysis
    Cuccovillo, Luca
    Gerhardt, Milica
    Aichroth, Patrick
    2023 IEEE INTERNATIONAL WORKSHOP ON INFORMATION FORENSICS AND SECURITY, WIFS, 2023,