Region Dependent Transform on MLP Features for Speech Recognition

被引:0
|
作者
Ng, Tim [1 ]
Zhang, Bing [1 ]
Matsoukas, Spyros [1 ]
Long Nguyen [1 ]
机构
[1] Raytheon BBN Technol, Cambridge, MA 02138 USA
关键词
Multi-Layer Perceptrons; bottleneck features; Region Dependent Transform; discriminative training; Mandarin speech recognition;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this work, Region Dependent Transform (RDT) is used as a feature extraction process to combine the traditional short-term acoustic features with the features derived from Multi-Layer Perceptrons (MLP) which is trained from the long-term features. When compared to the conventional feature augmentation approach, substantial improvement is obtained. Moreover, an improved RDT training procedure in which speaker dependent transforms are take into account is proposed for feature combinination in the Speaker Adaptive Training. By incorporating the higher dimensional features output from the layer prior to the bottleneck layer into our Speech-to-Text (SIT) system using RDT, significant improvement is achieved as compared to using the conventional bottleneck features. In summary, by using the features derived from MLP with RDT, 8.2% to 11.4% relative reduction in Character Error Rate is achieved for our Mandarin STT systems.
引用
收藏
页码:228 / 231
页数:4
相关论文
共 50 条
  • [1] TRAINING AND ADAPTING MLP FEATURES FOR ARABIC SPEECH RECOGNITION
    Park, J.
    Diehl, F.
    Gales, M. J. F.
    Tomalin, M.
    Woodland, P. C.
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4461 - 4464
  • [2] Efficient Generation and Use of MLP Features for Arabic Speech Recognition
    Park, J.
    Diehl, F.
    Gales, M. J. F.
    Tomalin, M.
    Woodland, P. C.
    [J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 240 - 243
  • [3] The efficient incorporation of MLP features into automatic speech recognition systems
    Park, J.
    Diehl, F.
    Gales, M. J. F.
    Tomalin, M.
    Woodland, P. C.
    [J]. COMPUTER SPEECH AND LANGUAGE, 2011, 25 (03): : 519 - 534
  • [4] Fractional Fourier transform features for speech recognition
    Sarikaya, R
    Gao, YQ
    Saon, G
    [J]. 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 529 - 532
  • [5] The Gamma MLP for speech phoneme recognition
    Lawrence, S
    Tsoi, AC
    Back, AD
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 8: PROCEEDINGS OF THE 1995 CONFERENCE, 1996, 8 : 785 - 791
  • [6] Comparison and combination of features in a hybrid HMM/MLP and a HMM/GMM speech recognition system
    Pujol, P
    Pol, S
    Nadeu, C
    Hagen, A
    Bourlard, H
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2005, 13 (01): : 14 - 22
  • [7] Scale-transform based features for application in speech recognition
    Umesh, S
    Cohen, L
    Nelson, D
    [J]. WAVELET APPLICATIONS IN SIGNAL AND IMAGE PROCESSING VII, 1999, 3813 : 727 - 731
  • [8] Variation of features of interframe dependent HMM for speech recognition
    Hanna, P
    Harte, N
    Ming, J
    Vaseghi, S
    Smith, FJ
    [J]. ELECTRONICS LETTERS, 1998, 34 (09) : 858 - 859
  • [9] REGION DEPENDENT LINEAR TRANSFORMS IN MULTILINGUAL SPEECH RECOGNITION
    Karafiat, Martin
    Janda, Milos
    Cernocky, Jan
    Burget, Lukas
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4885 - 4888
  • [10] REGION DEPENDENT LINEAR TRANSFORMS IN MULTILINGUAL SPEECH RECOGNITION
    Karafiat, Martin
    Janda, Milos
    Cernocky, Jan
    Burget, Lukas
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4885 - 4888