Region Dependent Transform on MLP Features for Speech Recognition

被引：0

作者：

Ng, Tim ^{[1
]}

Zhang, Bing ^{[1
]}

Matsoukas, Spyros ^{[1
]}

Long Nguyen ^{[1
]}

机构：

[1] Raytheon BBN Technol, Cambridge, MA 02138 USA

来源：

12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5 | 2011年

关键词：

Multi-Layer Perceptrons; bottleneck features; Region Dependent Transform; discriminative training; Mandarin speech recognition;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this work, Region Dependent Transform (RDT) is used as a feature extraction process to combine the traditional short-term acoustic features with the features derived from Multi-Layer Perceptrons (MLP) which is trained from the long-term features. When compared to the conventional feature augmentation approach, substantial improvement is obtained. Moreover, an improved RDT training procedure in which speaker dependent transforms are take into account is proposed for feature combinination in the Speaker Adaptive Training. By incorporating the higher dimensional features output from the layer prior to the bottleneck layer into our Speech-to-Text (SIT) system using RDT, significant improvement is achieved as compared to using the conventional bottleneck features. In summary, by using the features derived from MLP with RDT, 8.2% to 11.4% relative reduction in Character Error Rate is achieved for our Mandarin STT systems.

引用

页码：228 / 231

页数：4

共 50 条

[1] TRAINING AND ADAPTING MLP FEATURES FOR ARABIC SPEECH RECOGNITION
Park, J.
Diehl, F.
Gales, M. J. F.
Tomalin, M.
Woodland, P. C.
[J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4461 - 4464
[2] Efficient Generation and Use of MLP Features for Arabic Speech Recognition
Park, J.
Diehl, F.
Gales, M. J. F.
Tomalin, M.
Woodland, P. C.
[J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 240 - 243
[3] The efficient incorporation of MLP features into automatic speech recognition systems
Park, J.
Diehl, F.
Gales, M. J. F.
Tomalin, M.
Woodland, P. C.
[J]. COMPUTER SPEECH AND LANGUAGE, 2011, 25 (03): : 519 - 534
[4] Fractional Fourier transform features for speech recognition
Sarikaya, R
Gao, YQ
Saon, G
[J]. 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 529 - 532
[5] The Gamma MLP for speech phoneme recognition
Lawrence, S
Tsoi, AC
Back, AD
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 8: PROCEEDINGS OF THE 1995 CONFERENCE, 1996, 8 : 785 - 791
[6] Comparison and combination of features in a hybrid HMM/MLP and a HMM/GMM speech recognition system
Pujol, P
Pol, S
Nadeu, C
Hagen, A
Bourlard, H
[J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2005, 13 (01): : 14 - 22
[7] Scale-transform based features for application in speech recognition
Umesh, S
Cohen, L
Nelson, D
[J]. WAVELET APPLICATIONS IN SIGNAL AND IMAGE PROCESSING VII, 1999, 3813 : 727 - 731
[8] Variation of features of interframe dependent HMM for speech recognition
Hanna, P
Harte, N
Ming, J
Vaseghi, S
Smith, FJ
[J]. ELECTRONICS LETTERS, 1998, 34 (09) : 858 - 859
[9] REGION DEPENDENT LINEAR TRANSFORMS IN MULTILINGUAL SPEECH RECOGNITION
Karafiat, Martin
Janda, Milos
Cernocky, Jan
Burget, Lukas
[J]. 2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4885 - 4888
[10] REGION DEPENDENT LINEAR TRANSFORMS IN MULTILINGUAL SPEECH RECOGNITION
Karafiat, Martin
Janda, Milos
Cernocky, Jan
Burget, Lukas
[J]. 2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4885 - 4888

← 1 2 3 4 5 →