Synthesizing Audio from Tongue Motion During Speech Using Tagged MRI Via Transformer

被引:2
|
作者
Liu, Xiaofeng [1 ,2 ]
Xing, Fangxu [1 ,2 ]
Prince, Jerry L. [3 ]
Stone, Maureen [4 ]
El Fakhri, Georges [1 ,2 ]
Woo, Jonghye [1 ,2 ]
机构
[1] Massachusetts Gen Hosp, Gordon Ctr Med Imaging, Boston, MA 02114 USA
[2] Harvard Med Sch, Boston, MA 02114 USA
[3] Johns Hopkins Univ, Dept Elect & Comp Engn, Baltimore, MD 21218 USA
[4] Univ Maryland Sch Dent, Dept Neural & Pain Sci, Baltimore, MD 21201 USA
来源
MEDICAL IMAGING 2023 | 2023年 / 12464卷
关键词
Motion Fields; Transformer; Audio Synthesis; MRI;
D O I
10.1117/12.2653345
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Investigating the relationship between internal tissue point motion of the tongue and oropharyngeal muscle deformation measured from tagged MRI and intelligible speech can aid in advancing speech motor control theories and developing novel treatment methods for speech related-disorders. However, elucidating the relationship between these two sources of information is challenging, due in part to the disparity in data structure between spatiotemporal motion fields (i.e., 4D motion fields) and one-dimensional audio waveforms. In this work, we present an efficient encoder-decoder translation network for exploring the predictive information inherent in 4D motion fields via 2D spectrograms as a surrogate of the audio data. Specifically, our encoder is based on 3D convolutional spatial modeling and transformer-based temporal modeling. The extracted features are processed by an asymmetric 2D convolution decoder to generate spectrograms that correspond to 4D motion fields. Furthermore, we incorporate a generative adversarial training approach into our framework to further improve synthesis quality on our generated spectrograms. We experiment on 63 paired motion field sequences and speech waveforms, demonstrating that our framework enables the generation of clear audio waveforms from a sequence of motion fields. Thus, our framework has the potential to improve our understanding of the relationship between these two modalities and inform the development of treatments for speech disorders.
引用
收藏
页数:5
相关论文
共 50 条
  • [21] Quantifying Velopharyngeal Motion Variation in Speech Sound Production Using an Audio-Informed Dynamic MRI Atlas
    Xing, Fangxu
    Jin, Riwei
    Gilbert, Imani
    El Fakhri, Georges
    Perry, Jamie
    Sutton, Bradley
    Woo, Jonghye
    MEDICAL IMAGING 2023, 2023, 12464
  • [22] 3D Tongue Motion from Tagged and Cine MR Images
    Xing, Fangxu
    Woo, Jonghye
    Murano, Emi Z.
    Lee, Junghoon
    Stone, Maureen
    Prince, Jerry L.
    MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION (MICCAI 2013), PT III, 2013, 8151 : 41 - 48
  • [23] PET cardiac contractile motion compensation via tagged MRI priors.
    Klein, GJ
    Jamsheed, HC
    Reutter, BW
    Saloner, DA
    Schreck, CE
    Botvinick, EH
    Budinger, TF
    Huesman, RH
    JOURNAL OF NUCLEAR MEDICINE, 2001, 42 (05) : 8P - 8P
  • [24] Audio Transformer for Synthetic Speech Detection via Benford's Law Distribution Analysis
    Ashoka, Anitha Bhat Talagini
    Cuccovillo, Luca
    Aichroth, Patrick
    PROCEEDINGS OF THE 3RD ACM INTERNATIONAL WORKSHOP ON MULTIMEDIA AI AGAINST DISINFORMATION, MAD 2024, 2024, : 23 - 29
  • [25] DETERMINING TONGUE BODY MOTION FROM ACOUSTIC SPEECH WAVE
    HAFER, EH
    COKER, CH
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1975, 57 : S3 - S3
  • [26] Recording high quality speech during tagged cine-MRI studies using a fiber optic microphone
    NessAiver, MS
    Stone, M
    Parthasarathy, V
    Kahana, Y
    Paritsky, A
    JOURNAL OF MAGNETIC RESONANCE IMAGING, 2006, 23 (01) : 92 - 97
  • [27] Tagged-to-Cine MRI Sequence Synthesis via Light Spatial-Temporal Transformer
    Liu, Xiaofeng
    Xing, Fangxu
    Bian, Zhangxing
    Arias-Vergara, Tomas
    Perez-Toro, Paula Andrea
    Maier, Andreas
    Stone, Maureen
    Zhuo, Jiachen
    Prince, Jerry L.
    Woo, Jonghye
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT VII, 2024, 15007 : 701 - 711
  • [28] DRIMET: Deep Registration for 3D Incompressible Motion Estimation in Tagged-MRI with Application to the Tongue
    Bian, Zhangxing
    Xing, Fangxu
    Yu, Jinglun
    Shao, Muhan
    Liu, Yihao
    Carass, Aaron
    Zhuo, Jiachen
    Woo, Jonghye
    Prince, Jerry L.
    MEDICAL IMAGING WITH DEEP LEARNING, VOL 227, 2023, 227 : 134 - 150
  • [29] DEFORMABLE MESH MODEL OF CARDIAC MOTION FROM TAGGED MRI DATA
    Parages, Felipe M.
    Wernick, Miles N.
    Denney, Thomas S., Jr.
    Brankov, Jovan G.
    2009 IEEE INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING: FROM NANO TO MACRO, VOLS 1 AND 2, 2009, : 213 - +
  • [30] Analysis of 3-D Tongue Motion From Tagged Magnetic Resonance Images
    Xing, Fangxu
    Woo, Jonghye
    Lee, Junghoon
    Murano, Emi Z.
    Stone, Maureen
    Prince, Jerry L.
    JOURNAL OF SPEECH LANGUAGE AND HEARING RESEARCH, 2016, 59 (03): : 468 - 479