Sparse smoothing of articulatory features from Gaussian mixture model based acoustic-to-articulatory inversion: Benefit to speech recognition

被引:0
|
作者
Sudhakar, Prasad [1 ]
Ghosh, Prasanta Kumar [2 ]
机构
[1] Catholic Univ Louvain, ICTEAM ELEN, Louvain La Neuve, Belgium
[2] Indian Inst Sci IISc, Dept Elect Engn, Bangalore, Karnataka, India
来源
15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4 | 2014年
关键词
phonetic recognition; acoustic-to-articulatory inversion; smoothing; Gaussian mixture model; sparsity; Chambolle-Pock; l(1) minimization;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech recognition using articulatory features estimated using Acoustic-to-Articulatory Inversion (AAI) is considered. A recently proposed sparse smoothing approach is used to postprocess the estimates from Gaussian Mixture Model (GMM) based AAI using Minimum Mean Squared Error (MMSE) criterion. It is well known that low-pass smoothing as post-processing improves the AAI performance. Sparse smoothing, on the other hand, not only improves the AAI performance but also preserves the MMSE optimality for as many estimates as possible. In this work we investigate the benefit of preserving MMSE optimality during postprocessing by using the smoothed articulatory estimates in a broad class phonetic recognition task. Experimental results show that the low-pass filter based smoothing results in a significant drop in the recognition accuracy compared to that using articulatory estimates without any smoothing. However, the recognition accuracy obtained by articulatory features from sparse smoothing is similar to that using articulatory features directly from GMM based AAI without any post processing. Thus, sparse smoothing provides benefit both in terms of the inversion performance as well as recognition accuracy, while that is not the case with low-pass smoothing.
引用
收藏
页码:169 / 173
页数:5
相关论文
共 50 条
  • [31] Investigation of Stacked Deep Neural Networks and Mixture Density Networks for Acoustic-to-Articulatory Inversion
    Xie, Xurong
    Liu, Xunying
    Lee, Tan
    Wang, Lan
    2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 36 - 40
  • [32] Examining Vocal Tract Coordination in Childhood Apraxia of Speech with Acoustic-to-Articulatory Speech Inversion Feature Sets
    Benway, Nina R.
    Preston, Jonathan L.
    Espy-Wilson, Carol
    INTERSPEECH 2024, 2024, : 5138 - 5142
  • [33] DNN-based Acoustic-to-Articulatory Inversion using Ultrasound Tongue Imaging
    Porras, Dagoberto
    Sepulveda-Sepulveda, Alexander
    Csapo, Tamas Gabor
    2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
  • [34] Relevance-Weighted-Reconstruction of Articulatory Features in Deep-Neural-Network-Based Acoustic-to-Articulatory Mapping
    Canevari, Claudia
    Badino, Leonardo
    Fadiga, Luciano
    Metta, Giorgio
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1296 - 1300
  • [35] Speaker Adaptation of an Acoustic-Articulatory Inversion Model using Cascaded Gaussian Mixture Regressions
    Hueber, Thomas
    Bailly, Gerard
    Badin, Pierre
    Elisei, Frederic
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2752 - 2756
  • [36] Model-based Articulatory Phonetic Features for Improved Speech Recognition
    Huang, Guangpu
    Er, Meng Joo
    2012 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2012,
  • [37] Unsupervised Vocal-tract Length Estimation Through Model-based Acoustic-to-Articulatory Inversion
    Cai, Shanqing
    Bunnell, H. Timothy
    Patel, Rupal
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1711 - 1715
  • [38] Self-organizing speech recognition that processes acoustic and articulatory features
    Viana, Hesdras O.
    Araujo, Aluizio F. R.
    Barbosa, Danilo S.
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (13) : 39169 - 39195
  • [39] UNSUPERVISED ACOUSTIC-TO-ARTICULATORY INVERSION NEURAL NETWORK LEARNING BASED ON DETERMINISTIC POLICY GRADIENT
    Shibata, Hayato
    Zhang, Mingxin
    Shinozaki, Takahiro
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 530 - 537
  • [40] Self-organizing speech recognition that processes acoustic and articulatory features
    Hesdras O. Viana
    Aluízio F. R. Araújo
    Danilo S. Barbosa
    Multimedia Tools and Applications, 2024, 83 : 39169 - 39195