Recent advances in conversational speech recognition using conventional and recurrent neural networks

被引：15

作者：

Saon, G. ^{[1
]}

Picheny, M. ^{[1
]}

机构：

[1] IBM Watson Grp, Thomas J Watson Res Ctr, Yorktown Hts, NY 10598 USA

来源：

IBM JOURNAL OF RESEARCH AND DEVELOPMENT | 2017年 / 61卷 / 4-5期

关键词：

TRANSCRIPTION; SYSTEM; LSTM;

D O I：

10.1147/JRD.2017.2701178

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Deep learning methodologies have had a major impact on performance across a wide variety of machine learning tastes, and speech recognition is no exception. We describe a set of deep learning techniques that proved to be particularly successful in achieving performance gains in word error rate on a popular large vocabulary conversational speech recognition benchmark task ("Switchboard"). We found that the best performance is achieved by combining features from both recurrent and convolutional neural networks. We compare two recurrent architectures: partially unfolded nets with max-out activations and bidirectional long short-term memory nets. In addition, inspired by the success of convolutional networks for image classification, we designed a convolutional net with many convolutional layers and small kernels that create a receptive field with more nonlinearity and fewer parameters than standard configurations. When combined, these neural networks achieve a word error rate of 6.2% on this difficult task; this was the best reported rate at the time of this writing and is even more remarkable given that human performance itself is estimated to be 4% on this data.

引用

下载

页数：10

共 50 条

[41] NOISE ROBUST SPEECH RECOGNITION USING RECENT DEVELOPMENTS IN NEURAL NETWORKS FOR COMPUTER VISION
Yoshioka, Takuya
Ohnishi, Katsunori
Fang, Fuming
Nakatani, Toniohiro
2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5730 - 5734
[42] Recent experiments in Large Vocabulary Conversational Speech Recognition
Billa, J
Colhurst, T
El-Jaroudi, A
Iyer, R
Ma, K
Matsoukas, S
Quillen, C
Richardson, F
Siu, M
Zavaliagkos, G
Gish, H
ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 41 - 44
[43] Recent experiments in large vocabulary conversational speech recognition
Billa, J.
Colhurst, T.
El-Jaroudi, A.
Iyer, R.
Ma, K.
Matsoukas, S.
Quillen, C.
Richardson, F.
Siu, M.
Zavaliagkos, G.
Gish, H.
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 1999, 1 : 41 - 44
[44] RECENT ADVANCES IN AUTOMATIC SPEECH RECOGNITION
DEMORI, R
SIGNAL PROCESSING, 1979, 1 (02) : 95 - 123
[45] Separation and deconvolution of speech using recurrent neural networks
Li, Y
Powers, D
Wen, P
IC-AI'2001: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOLS I-III, 2001, : 1303 - 1309
[46] Segmental Recurrent Neural Networks for End-to-end Speech Recognition
Lu, Liang
Kong, Lingpeng
Dyer, Chris
Smith, Noah A.
Renals, Steve
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 385 - 389
[47] LEARNING ACOUSTIC FRAME LABELING FOR SPEECH RECOGNITION WITH RECURRENT NEURAL NETWORKS
Sak, Hasim
Senior, Andrew
Rao, Kanishka
Irsoy, Ozan
Graves, Alex
Beaufays, Francoise
Schalkwyk, Johan
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4280 - 4284
[48] CHARACTER-LEVEL INCREMENTAL SPEECH RECOGNITION WITH RECURRENT NEURAL NETWORKS
Hwang, Kyuyeon
Sung, Wonyong
2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5335 - 5339
[49] LANGUAGE MODEL BOOTSTRAPPING USING NEURAL MACHINE TRANSLATION FOR CONVERSATIONAL SPEECH RECOGNITION
Punjabi, Surabhi
Arsikere, Harish
Garimella, Sri
2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 487 - 493
[50] Temporal Feedback Convolutional Recurrent Neural Networks for Speech Command Recognition
Kim, Taejun
Nam, Juhan
PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 437 - 441

← 1 2 3 4 5 →