CONNECTIONIST APPROACHES TO LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION

被引:0
|
作者
SAWAI, H
MINAMI, Y
MIYATAKE, M
WAIBEL, A
SHIKANO, K
机构
关键词
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper describes recent progress in a connectionist large-vocabulary continuous speech recognition system integrating speech recognition and language processing. The speech recognition part consists of Large Phonemic Time-Delay Neural Networks (TDNNs) which can automatically spot all 24 Japanese phonemes (i.e., 18 consonants /b/, /d/, /g/, /p/, /t/, /k/, /m/, /n/, /N/, /s/, /sh/ ([integral]), /h/, /z/, /ch/ ([t-integral]), /ts/, /r/, /w/, /y/ ([j]) and 5 vowels /a/, /i/, /u/, /e/, /o/ and a double consonant /Q/ or silence) by simply scanning among input speech without any specific segmentation techniques. On the other hand, the language processing part is made up of a predictive LR parser in which the LR parser is guided by the LR parsing table automatically generated from context-free grammar rules, and proceeds left-to-right without backtracking. Time alignment between the predicted phonemes and a sequence of the TDNN phoneme outputs is carried out by the DTW matching method. We call this 'hybrid' integrated recognition system the 'TDNN-LR' method. We report that large-vocabulary isolated word and continuous speech recognition using the TDNN-LR method provided excellent speaker-dependent recognition performance, where incremental training using a small number of training tokens is found to be very effective for adaptation of speaking rate. Furthermore, we report some new achievements as extensions of the TDNN-LR method: (1) two proposed NN architectures provide robust phoneme recognition performance on variations of speaking manner, (2) a speaker-adaptation technique can be realized using a NN mapping function between input and standard speakers and (3) new architectures proposed for speaker-independent recognition provide performance that nearly matches speaker-dependent recognition performance.
引用
收藏
页码:1834 / 1844
页数:11
相关论文
共 50 条
  • [1] Connectionist language modeling for large vocabulary continuous speech recognition
    Schwenk, H
    Gauvain, JL
    [J]. 2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 765 - 768
  • [2] Boosting the performance of connectionist large vocabulary speech recognition
    Cook, G
    Robinson, T
    [J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1305 - 1308
  • [3] Vietnamese Large Vocabulary Continuous Speech Recognition
    Ngoc Thang Vu
    Schultz, Tanja
    [J]. 2009 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION & UNDERSTANDING (ASRU 2009), 2009, : 333 - 338
  • [4] Advances in large vocabulary continuous speech recognition
    Zweig, G
    Picheny, M
    [J]. ADVANCES IN COMPUTERS, VOL. 60: INFORMATION SECURITY, 2004, 60 : 249 - 291
  • [5] Confidence measures for large vocabulary continuous speech recognition
    Wessel, F
    Schlüter, R
    Macherey, K
    Ney, H
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2001, 9 (03): : 288 - 298
  • [6] Combating Reverberation in Large Vocabulary Continuous Speech Recognition
    Mitra, Vikramjit
    Van Hout, Julien
    McLaren, Mitchell
    Wang, Wen
    Graciarena, Martin
    Vergyri, Dimitra
    Franco, Horacio
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2449 - 2453
  • [7] Utilizing Lipreading in Large Vocabulary Continuous Speech Recognition
    Palecek, Karel
    [J]. SPEECH AND COMPUTER, SPECOM 2017, 2017, 10458 : 767 - 776
  • [8] The RWTH large vocabulary continuous speech recognition system
    Ney, H
    Welling, L
    Ortmanns, S
    Beulen, K
    Wessel, F
    [J]. PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 853 - 856
  • [9] Developments in large vocabulary, continuous speech recognition of German
    AddaDecker, M
    Adda, G
    Lamel, L
    Gauvain, JL
    [J]. 1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 153 - 156
  • [10] Accent Issues in Large Vocabulary Continuous Speech Recognition
    Chao Huang
    Tao Chen
    Eric Chang
    [J]. International Journal of Speech Technology, 2004, 7 (2-3) : 141 - 153