Sparse smoothing of articulatory features from Gaussian mixture model based acoustic-to-articulatory inversion: Benefit to speech recognition

被引：0

作者：

Sudhakar, Prasad ^{[1
]}

Ghosh, Prasanta Kumar ^{[2
]}

机构：

[1] Catholic Univ Louvain, ICTEAM ELEN, Louvain La Neuve, Belgium

[2] Indian Inst Sci IISc, Dept Elect Engn, Bangalore, Karnataka, India

来源：

15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4 | 2014年

关键词：

phonetic recognition; acoustic-to-articulatory inversion; smoothing; Gaussian mixture model; sparsity; Chambolle-Pock; l(1) minimization;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Speech recognition using articulatory features estimated using Acoustic-to-Articulatory Inversion (AAI) is considered. A recently proposed sparse smoothing approach is used to postprocess the estimates from Gaussian Mixture Model (GMM) based AAI using Minimum Mean Squared Error (MMSE) criterion. It is well known that low-pass smoothing as post-processing improves the AAI performance. Sparse smoothing, on the other hand, not only improves the AAI performance but also preserves the MMSE optimality for as many estimates as possible. In this work we investigate the benefit of preserving MMSE optimality during postprocessing by using the smoothed articulatory estimates in a broad class phonetic recognition task. Experimental results show that the low-pass filter based smoothing results in a significant drop in the recognition accuracy compared to that using articulatory estimates without any smoothing. However, the recognition accuracy obtained by articulatory features from sparse smoothing is similar to that using articulatory features directly from GMM based AAI without any post processing. Thus, sparse smoothing provides benefit both in terms of the inversion performance as well as recognition accuracy, while that is not the case with low-pass smoothing.

引用

页码：169 / 173

页数：5

共 50 条

[31] Investigation of Stacked Deep Neural Networks and Mixture Density Networks for Acoustic-to-Articulatory Inversion
Xie, Xurong
Liu, Xunying
Lee, Tan
Wang, Lan
2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 36 - 40
[32] Examining Vocal Tract Coordination in Childhood Apraxia of Speech with Acoustic-to-Articulatory Speech Inversion Feature Sets
Benway, Nina R.
Preston, Jonathan L.
Espy-Wilson, Carol
INTERSPEECH 2024, 2024, : 5138 - 5142
[33] DNN-based Acoustic-to-Articulatory Inversion using Ultrasound Tongue Imaging
Porras, Dagoberto
Sepulveda-Sepulveda, Alexander
Csapo, Tamas Gabor
2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
[34] Relevance-Weighted-Reconstruction of Articulatory Features in Deep-Neural-Network-Based Acoustic-to-Articulatory Mapping
Canevari, Claudia
Badino, Leonardo
Fadiga, Luciano
Metta, Giorgio
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1296 - 1300
[35] Speaker Adaptation of an Acoustic-Articulatory Inversion Model using Cascaded Gaussian Mixture Regressions
Hueber, Thomas
Bailly, Gerard
Badin, Pierre
Elisei, Frederic
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2752 - 2756
[36] Model-based Articulatory Phonetic Features for Improved Speech Recognition
Huang, Guangpu
Er, Meng Joo
2012 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2012,
[37] Unsupervised Vocal-tract Length Estimation Through Model-based Acoustic-to-Articulatory Inversion
Cai, Shanqing
Bunnell, H. Timothy
Patel, Rupal
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1711 - 1715
[38] Self-organizing speech recognition that processes acoustic and articulatory features
Viana, Hesdras O.
Araujo, Aluizio F. R.
Barbosa, Danilo S.
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (13) : 39169 - 39195
[39] UNSUPERVISED ACOUSTIC-TO-ARTICULATORY INVERSION NEURAL NETWORK LEARNING BASED ON DETERMINISTIC POLICY GRADIENT
Shibata, Hayato
Zhang, Mingxin
Shinozaki, Takahiro
2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 530 - 537
[40] Self-organizing speech recognition that processes acoustic and articulatory features
Hesdras O. Viana
Aluízio F. R. Araújo
Danilo S. Barbosa
Multimedia Tools and Applications, 2024, 83 : 39169 - 39195

← 1 2 3 4 5 →