Sparse smoothing of articulatory features from Gaussian mixture model based acoustic-to-articulatory inversion: Benefit to speech recognition

被引：0

作者：

Sudhakar, Prasad ^{[1
]}

Ghosh, Prasanta Kumar ^{[2
]}

机构：

[1] Catholic Univ Louvain, ICTEAM ELEN, Louvain La Neuve, Belgium

[2] Indian Inst Sci IISc, Dept Elect Engn, Bangalore, Karnataka, India

来源：

15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4 | 2014年

关键词：

phonetic recognition; acoustic-to-articulatory inversion; smoothing; Gaussian mixture model; sparsity; Chambolle-Pock; l(1) minimization;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Speech recognition using articulatory features estimated using Acoustic-to-Articulatory Inversion (AAI) is considered. A recently proposed sparse smoothing approach is used to postprocess the estimates from Gaussian Mixture Model (GMM) based AAI using Minimum Mean Squared Error (MMSE) criterion. It is well known that low-pass smoothing as post-processing improves the AAI performance. Sparse smoothing, on the other hand, not only improves the AAI performance but also preserves the MMSE optimality for as many estimates as possible. In this work we investigate the benefit of preserving MMSE optimality during postprocessing by using the smoothed articulatory estimates in a broad class phonetic recognition task. Experimental results show that the low-pass filter based smoothing results in a significant drop in the recognition accuracy compared to that using articulatory estimates without any smoothing. However, the recognition accuracy obtained by articulatory features from sparse smoothing is similar to that using articulatory features directly from GMM based AAI without any post processing. Thus, sparse smoothing provides benefit both in terms of the inversion performance as well as recognition accuracy, while that is not the case with low-pass smoothing.

引用

页码：169 / 173

页数：5

共 50 条

[41] Deep Neural Network Based Acoustic-to-articulatory Inversion Using Phone Sequence Information
Xie, Xurong
Liu, Xunying
Wang, Lan
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1497 - 1501
[42] Speaker-Adaptive Acoustic-Articulatory Inversion Using Cascaded Gaussian Mixture Regression
Hueber, Thomas
Girin, Laurent
Alameda-Pineda, Xavier
Bailly, Gerard
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (12) : 2246 - 2259
[43] Acoustic-to-Articulatory Inversion of a Three-dimensional Theoretical Vocal Tract Model Using Deep Learning-based Model
Lapthawan, Thanat
Prom-on, Santitham
2019 IEEE 10TH INTERNATIONAL CONFERENCE ON AWARENESS SCIENCE AND TECHNOLOGY (ICAST 2019), 2019, : 52 - 56
[44] DEEP-LEVEL ACOUSTIC-TO-ARTICULATORY MAPPING FOR DBN-HMM BASED PHONE RECOGNITION
Badino, Leonardo
Canevari, Claudia
Fadiga, Luciano
Metta, Giorgio
2012 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2012), 2012, : 370 - 375
[45] ACOUSTIC-TO-ARTICULATORY INVERSION FOR DYSARTHRIC SPEECH: ARE PRE-TRAINED SELF-SUPERVISED REPRESENTATIONS FAVORABLE?<bold> </bold>
Maharana, Sarthak Kumar
Adidam, Krishna Kamal
Nandi, Shoumik
Srivastava, Ajitesh
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024, 2024, : 408 - 412
[46] ARTICULATORY FEATURES FROM DEEP NEURAL NETWORKS AND THEIR ROLE IN SPEECH RECOGNITION
Mitra, Vikramjit
Sivaraman, Ganesh
Nam, Hosung
Espy-Wilson, Carol
Saltzman, Elliot
2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[47] Statistical mapping between articulatory movements and acoustic spectrum using a Gaussian mixture model
Toda, Tomoki
Black, Alan W.
Tokuda, Keiichi
SPEECH COMMUNICATION, 2008, 50 (03) : 215 - 227
[48] Articulatory Features Based TDNN Model for Spoken Language Recognition
Yu, Jiawei
Guo, Minghao
Xie, Yanlu
Zhang, Jinsong
PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2019, : 308 - 312
[49] Hybrid convolutional neural networks for articulatory and acoustic information based speech recognition
Mitra, Vikramjit
Sivaraman, Ganesh
Nam, Hosung
Espy-Wilson, Carol
Saltzman, Elliot
Tiede, Mark
SPEECH COMMUNICATION, 2017, 89 : 103 - 112
[50] Cross-speaker Acoustic-to-Articulatory Inversion using Phone-based Trajectory HAM for Pronunciation Training
Hueber, Thomas
Ben-Youssef, Atef
Bailly, Gerard
Badin, Pierre
Elisei, Frederic
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 782 - 785

← 1 2 3 4 5 →