Investigation of Stacked Deep Neural Networks and Mixture Density Networks for Acoustic-to-Articulatory Inversion

被引:0
|
作者
Xie, Xurong [1 ,2 ]
Liu, Xunying [1 ,2 ]
Lee, Tan [1 ]
Wang, Lan [2 ]
机构
[1] Chinese Univ Hong Kong, Hong Kong, Peoples R China
[2] Chinese Acad Sci, Shenzhen Inst Adv Technol, Shenzhen, Peoples R China
来源
2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP) | 2018年
关键词
articulatory inversion; stacked; deep neural network; mixture density network; EMA;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Acoustic-to-articulatory inversion predicting articulatory movement based on the acoustic signal is useful for many applications like talking head, speech recognition, and education. DNN based technologies have achieved the state-of-the-art performance in the area. This paper investigates different stacked network architectures for acoustic-to-articulatory inversion. Two levels of DNNs or mixture density networks (MDNs) can be connected using different types of auxiliary features, including bottleneck features, directly generated features, and predicted articulatory features via MLPG algorithm extracted from the first level network. For the experiments, stacked systems using DNNs, time-delay DNNs (TDNNs), RNNs and MDNs were evaluated on both the MNGU0 English EMA database and AIMSL Chinese EMA database. Finally, on the default configurations of MNGU0 data using LSF acoustic features, the proposed stacked system using feed-forward MDNs with ellipsoid variance and MLPG generated features got 0.718mm in RMSE, which is similar to the RNN and RNN-MDN BLSTM systems with slower and more difficult training stage.
引用
收藏
页码:36 / 40
页数:5
相关论文
共 50 条
  • [1] A DEEP RECURRENT APPROACH FOR ACOUSTIC-TO-ARTICULATORY INVERSION
    Liu, Peng
    Yu, Quanjie
    Wu, Zhiyong
    Kang, Shiyin
    Meng, Helen
    Cai, Lainhong
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4450 - 4454
  • [2] Acoustic-to-Articulatory Inversion with Deep Autoregressive Articulatory-WaveNet
    Bozorg, Narjes
    Johnson, Michael T.
    INTERSPEECH 2020, 2020, : 3725 - 3729
  • [3] Trajectory mixture density networks with multiple mixtures for acoustic-articulatory inversion
    Richmond, Korin
    ADVANCES IN NONLINEAR SPEECH PROCESSING, 2007, 4885 : 263 - 272
  • [4] Deep Neural Network Based Acoustic-to-articulatory Inversion Using Phone Sequence Information
    Xie, Xurong
    Liu, Xunying
    Wang, Lan
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1497 - 1501
  • [5] Deep Acoustic-to-Articulatory Inversion Mapping with Latent Trajectory Modeling
    Tobing, Patrick Lumban
    Kameoka, Hirokazu
    Toda, Tomoki
    2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC 2017), 2017, : 1233 - 1236
  • [6] Jerk Minimization for Acoustic-To-Articulatory Inversion
    Rajpal, Avni
    Patil, Hemant A.
    9th ISCA Speech Synthesis Workshop, SSW 2016, 2016, : 82 - 87
  • [7] Formant Trajectories for Acoustic-to-Articulatory Inversion
    Ozbek, I. Yuecel
    Hasegawa-Johnson, Mark
    Demirekler, Muebeccel
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2783 - +
  • [8] Acoustic-to-Articulatory Mapping With Joint Optimization of Deep Speech Enhancement and Articulatory Inversion Models
    Shahrebabaki, Abdolreza Sabzi
    Salvi, Giampiero
    Svendsen, Torbjorn
    Siniscalchi, Sabato Marco
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 135 - 147
  • [9] On smoothing articulatory trajectories obtained from Gaussian mixture model based acoustic-to-articulatory inversion
    Ghosh, Prasanta K.
    Narayanan, Shrikanth S.
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2013, 134 (02): : EL258 - EL264
  • [10] Incorporation of phonetic constraints in acoustic-to-articulatory inversion
    Potard, Blaise
    Laprie, Yves
    Ouni, Slim
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2008, 123 (04): : 2310 - 2323