CONNECTIONIST APPROACHES TO LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION

被引：0

作者：

SAWAI, H

MINAMI, Y

MIYATAKE, M

WAIBEL, A

SHIKANO, K

机构：

来源：

IEICE TRANSACTIONS ON COMMUNICATIONS ELECTRONICS INFORMATION AND SYSTEMS | 1991年 / 74卷 / 07期

关键词：

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

This paper describes recent progress in a connectionist large-vocabulary continuous speech recognition system integrating speech recognition and language processing. The speech recognition part consists of Large Phonemic Time-Delay Neural Networks (TDNNs) which can automatically spot all 24 Japanese phonemes (i.e., 18 consonants /b/, /d/, /g/, /p/, /t/, /k/, /m/, /n/, /N/, /s/, /sh/ ([integral]), /h/, /z/, /ch/ ([t-integral]), /ts/, /r/, /w/, /y/ ([j]) and 5 vowels /a/, /i/, /u/, /e/, /o/ and a double consonant /Q/ or silence) by simply scanning among input speech without any specific segmentation techniques. On the other hand, the language processing part is made up of a predictive LR parser in which the LR parser is guided by the LR parsing table automatically generated from context-free grammar rules, and proceeds left-to-right without backtracking. Time alignment between the predicted phonemes and a sequence of the TDNN phoneme outputs is carried out by the DTW matching method. We call this 'hybrid' integrated recognition system the 'TDNN-LR' method. We report that large-vocabulary isolated word and continuous speech recognition using the TDNN-LR method provided excellent speaker-dependent recognition performance, where incremental training using a small number of training tokens is found to be very effective for adaptation of speaking rate. Furthermore, we report some new achievements as extensions of the TDNN-LR method: (1) two proposed NN architectures provide robust phoneme recognition performance on variations of speaking manner, (2) a speaker-adaptation technique can be realized using a NN mapping function between input and standard speakers and (3) new architectures proposed for speaker-independent recognition provide performance that nearly matches speaker-dependent recognition performance.

引用

页码：1834 / 1844

页数：11

共 50 条

[41] Speaker selection training for large vocabulary continuous speech recognition
Huang, C
Chen, T
Chang, E
[J]. 2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 609 - 612
[42] A Detailed Survey on Large Vocabulary Continuous Speech Recognition Techniques
Vanajakshi, P.
Mathivanan, M.
[J]. 2017 INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND INFORMATICS (ICCCI), 2017,
[43] Syllable-based large vocabulary continuous speech recognition
Ganapathiraju, A
Hamaker, J
Picone, J
Ordowski, M
Doddington, GR
[J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2001, 9 (04): : 358 - 366
[44] IMPROVEMENTS ON BOTTLENECK FEATURE FOR LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION
Tuerxun, Maimaitiaili
Zhang, Shiliang
Bao, Yebo
Dai, Lirong
[J]. 2014 12TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2014, : 516 - 520
[45] Integrating Stress Information in Large Vocabulary Continuous Speech Recognition
Ludusan, Bogdan
Ziegler, Stefan
Gravier, Guillaume
[J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2641 - 2644
[46] A LAYERED APPROACH FOR DUTCH LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION
Pelemans, Joris
Demuynck, Kris
Wambacq, Patrick
[J]. 2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4421 - 4424
[47] JNAS: Japanese speech corpus for large vocabulary continuous speech recognition research
Itou, Katunobu
Yamamoto, Mikio
Takeda, Kazuya
Takezawa, Toshiyuki
Matsuoka, Tatsuo
Kobayashi, Tetsunori
Shikano, Kiyohiro
Itahashi, Shuichi
[J]. Journal of the Acoustical Society of Japan (E) (English translation of Nippon Onkyo Gakkaishi), 1999, 20 (03): : 199 - 206
[48] SHORTLIST - A CONNECTIONIST MODEL OF CONTINUOUS SPEECH RECOGNITION
NORRIS, D
[J]. COGNITION, 1994, 52 (03) : 189 - 234
[49] Integrating induced probability into decoding for large vocabulary continuous speech recognition
Yang, Zhanlei
Liu, Wenju
Chao, Hao
[J]. Shengxue Xuebao/Acta Acustica, 2012, 37 (02): : 209 - 217
[50] Visual information assisted mandarin large vocabulary continuous speech recognition
Liu, P
Wang, ZY
[J]. 2003 INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING, PROCEEDINGS, 2003, : 72 - 77

← 1 2 3 4 5 →