Sparse smoothing of articulatory features from Gaussian mixture model based acoustic-to-articulatory inversion: Benefit to speech recognition

被引:0
|
作者
Sudhakar, Prasad [1 ]
Ghosh, Prasanta Kumar [2 ]
机构
[1] Catholic Univ Louvain, ICTEAM ELEN, Louvain La Neuve, Belgium
[2] Indian Inst Sci IISc, Dept Elect Engn, Bangalore, Karnataka, India
来源
15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4 | 2014年
关键词
phonetic recognition; acoustic-to-articulatory inversion; smoothing; Gaussian mixture model; sparsity; Chambolle-Pock; l(1) minimization;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech recognition using articulatory features estimated using Acoustic-to-Articulatory Inversion (AAI) is considered. A recently proposed sparse smoothing approach is used to postprocess the estimates from Gaussian Mixture Model (GMM) based AAI using Minimum Mean Squared Error (MMSE) criterion. It is well known that low-pass smoothing as post-processing improves the AAI performance. Sparse smoothing, on the other hand, not only improves the AAI performance but also preserves the MMSE optimality for as many estimates as possible. In this work we investigate the benefit of preserving MMSE optimality during postprocessing by using the smoothed articulatory estimates in a broad class phonetic recognition task. Experimental results show that the low-pass filter based smoothing results in a significant drop in the recognition accuracy compared to that using articulatory estimates without any smoothing. However, the recognition accuracy obtained by articulatory features from sparse smoothing is similar to that using articulatory features directly from GMM based AAI without any post processing. Thus, sparse smoothing provides benefit both in terms of the inversion performance as well as recognition accuracy, while that is not the case with low-pass smoothing.
引用
收藏
页码:169 / 173
页数:5
相关论文
共 50 条
  • [41] Deep Neural Network Based Acoustic-to-articulatory Inversion Using Phone Sequence Information
    Xie, Xurong
    Liu, Xunying
    Wang, Lan
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1497 - 1501
  • [42] Speaker-Adaptive Acoustic-Articulatory Inversion Using Cascaded Gaussian Mixture Regression
    Hueber, Thomas
    Girin, Laurent
    Alameda-Pineda, Xavier
    Bailly, Gerard
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (12) : 2246 - 2259
  • [43] Acoustic-to-Articulatory Inversion of a Three-dimensional Theoretical Vocal Tract Model Using Deep Learning-based Model
    Lapthawan, Thanat
    Prom-on, Santitham
    2019 IEEE 10TH INTERNATIONAL CONFERENCE ON AWARENESS SCIENCE AND TECHNOLOGY (ICAST 2019), 2019, : 52 - 56
  • [44] DEEP-LEVEL ACOUSTIC-TO-ARTICULATORY MAPPING FOR DBN-HMM BASED PHONE RECOGNITION
    Badino, Leonardo
    Canevari, Claudia
    Fadiga, Luciano
    Metta, Giorgio
    2012 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2012), 2012, : 370 - 375
  • [45] ACOUSTIC-TO-ARTICULATORY INVERSION FOR DYSARTHRIC SPEECH: ARE PRE-TRAINED SELF-SUPERVISED REPRESENTATIONS FAVORABLE?<bold> </bold>
    Maharana, Sarthak Kumar
    Adidam, Krishna Kamal
    Nandi, Shoumik
    Srivastava, Ajitesh
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024, 2024, : 408 - 412
  • [46] ARTICULATORY FEATURES FROM DEEP NEURAL NETWORKS AND THEIR ROLE IN SPEECH RECOGNITION
    Mitra, Vikramjit
    Sivaraman, Ganesh
    Nam, Hosung
    Espy-Wilson, Carol
    Saltzman, Elliot
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [47] Statistical mapping between articulatory movements and acoustic spectrum using a Gaussian mixture model
    Toda, Tomoki
    Black, Alan W.
    Tokuda, Keiichi
    SPEECH COMMUNICATION, 2008, 50 (03) : 215 - 227
  • [48] Articulatory Features Based TDNN Model for Spoken Language Recognition
    Yu, Jiawei
    Guo, Minghao
    Xie, Yanlu
    Zhang, Jinsong
    PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2019, : 308 - 312
  • [49] Hybrid convolutional neural networks for articulatory and acoustic information based speech recognition
    Mitra, Vikramjit
    Sivaraman, Ganesh
    Nam, Hosung
    Espy-Wilson, Carol
    Saltzman, Elliot
    Tiede, Mark
    SPEECH COMMUNICATION, 2017, 89 : 103 - 112
  • [50] Cross-speaker Acoustic-to-Articulatory Inversion using Phone-based Trajectory HAM for Pronunciation Training
    Hueber, Thomas
    Ben-Youssef, Atef
    Bailly, Gerard
    Badin, Pierre
    Elisei, Frederic
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 782 - 785