DNN-based Acoustic-to-Articulatory Inversion using Ultrasound Tongue Imaging

被引:0
|
作者
Porras, Dagoberto [1 ]
Sepulveda-Sepulveda, Alexander [1 ]
Csapo, Tamas Gabor [2 ,3 ]
机构
[1] Univ Ind Santander, Escuela Ingn Elect Elect & Telecomunicac, Bucaramanga, Santander, Colombia
[2] Budapest Univ Technol & Econ, Dept Telecommun & Media Informat, Budapest, Hungary
[3] MTA ELTE Lendiilet Lingual Articulat Res Grp, Budapest, Hungary
关键词
articulatory; ultrasound; deep neural networks; inversion; SILENT SPEECH RECOGNITION; MOVEMENTS; FEATURES;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech sounds are produced as the coordinated movement of the speaking organs. There are several available methods to model the relation of articulatory movements and the resulting speech signal. The reverse problem is often called as acoustic-to-articulatory inversion (AAI). In this paper we have implemented several different Deep Neural Networks (DNNs) to estimate the articulatory information from the acoustic signal. There are several previous works related to performing this task, but most of them are using ElectroMagnetic Articulography (EMA) for tracking the articulatory movement. Compared to EMA, Ultrasound Tongue Imaging (UTI) is a technique of higher cost-benefit if we take into account equipment cost, portability, safety and visualized structures. Seeing that, our goal is to train a DNN to obtain UT images, when using speech as input. We also test two approaches to represent the articulatory information: 1) the EigenTongue space and 2) the raw ultrasound image. As an objective quality measure for the reconstructed UT images, we use MSE, Structural Similarity Index (SSIM) and Complex-Wavelet SSIM (CW-SSIM). Our experimental results show that CW-SSIM is the most useful error measure in the UTI context. We tested three different system configurations: a) simple DNN composed of 2 hidden layers with 64x64 pixels of an UTI file as target; b) the same simple DNN but with ultrasound images projected to the EigenTongue space as the target; c) and a more complex DNN composed of 5 hidden layers with UTI files projected to the EigenTongue space. In a subjective experiment the subjects found that the neural networks with two hidden layers were more suitable for this inversion task.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] A DNN Based Speech Enhancement Approach to Noise Robust Acoustic-to-Articulatory Inversion
    Shahrebabaki, Abdolreza Sabzi
    Siniscalchi, Sabato Marco
    Salvi, Giampiero
    Svendsen, Torbjorn
    [J]. 2021 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2021,
  • [2] ACOUSTIC-TO-ARTICULATORY INVERSION USING AN EPISODIC MEMORY
    Demange, S.
    Ouni, S.
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4620 - 4623
  • [3] Acoustic-to-Articulatory Inversion based on Local Regression
    Al Moubayed, Samer
    Ananthakrishnan, G.
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 937 - 940
  • [4] Acoustic-to-Articulatory Inversion Using Particle Swarm Optimization
    Fairee, Suthida
    Sirinaovakul, Booncharoen
    Prom-on, Santitham
    [J]. 2015 12TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING/ELECTRONICS, COMPUTER, TELECOMMUNICATIONS AND INFORMATION TECHNOLOGY (ECTI-CON), 2015,
  • [5] Jerk Minimization for Acoustic-To-Articulatory Inversion
    Rajpal, Avni
    Patil, Hemant A.
    [J]. 9th ISCA Speech Synthesis Workshop, SSW 2016, 2016, : 82 - 87
  • [6] Modeling the articulatory space using a hypercube codebook for acoustic-to-articulatory inversion
    Ouni, S
    Laprie, Y
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2005, 118 (01): : 444 - 460
  • [7] Formant Trajectories for Acoustic-to-Articulatory Inversion
    Ozbek, I. Yuecel
    Hasegawa-Johnson, Mark
    Demirekler, Muebeccel
    [J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2783 - +
  • [8] Incorporation of phonetic constraints in acoustic-to-articulatory inversion
    Potard, Blaise
    Laprie, Yves
    Ouni, Slim
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2008, 123 (04): : 2310 - 2323
  • [9] A generalized smoothness criterion for acoustic-to-articulatory inversion
    Ghosh, Prasanta Kumar
    Narayanan, Shrikanth
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2010, 128 (04): : 2162 - 2172
  • [10] A DEEP RECURRENT APPROACH FOR ACOUSTIC-TO-ARTICULATORY INVERSION
    Liu, Peng
    Yu, Quanjie
    Wu, Zhiyong
    Kang, Shiyin
    Meng, Helen
    Cai, Lainhong
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4450 - 4454