Recent advances in conversational speech recognition using conventional and recurrent neural networks

被引:15
|
作者
Saon, G. [1 ]
Picheny, M. [1 ]
机构
[1] IBM Watson Grp, Thomas J Watson Res Ctr, Yorktown Hts, NY 10598 USA
关键词
TRANSCRIPTION; SYSTEM; LSTM;
D O I
10.1147/JRD.2017.2701178
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Deep learning methodologies have had a major impact on performance across a wide variety of machine learning tastes, and speech recognition is no exception. We describe a set of deep learning techniques that proved to be particularly successful in achieving performance gains in word error rate on a popular large vocabulary conversational speech recognition benchmark task ("Switchboard"). We found that the best performance is achieved by combining features from both recurrent and convolutional neural networks. We compare two recurrent architectures: partially unfolded nets with max-out activations and bidirectional long short-term memory nets. In addition, inspired by the success of convolutional networks for image classification, we designed a convolutional net with many convolutional layers and small kernels that create a receptive field with more nonlinearity and fewer parameters than standard configurations. When combined, these neural networks achieve a word error rate of 6.2% on this difficult task; this was the best reported rate at the time of this writing and is even more remarkable given that human performance itself is estimated to be 4% on this data.
引用
下载
收藏
页数:10
相关论文
共 50 条
  • [41] NOISE ROBUST SPEECH RECOGNITION USING RECENT DEVELOPMENTS IN NEURAL NETWORKS FOR COMPUTER VISION
    Yoshioka, Takuya
    Ohnishi, Katsunori
    Fang, Fuming
    Nakatani, Toniohiro
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5730 - 5734
  • [42] Recent experiments in Large Vocabulary Conversational Speech Recognition
    Billa, J
    Colhurst, T
    El-Jaroudi, A
    Iyer, R
    Ma, K
    Matsoukas, S
    Quillen, C
    Richardson, F
    Siu, M
    Zavaliagkos, G
    Gish, H
    ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 41 - 44
  • [43] Recent experiments in large vocabulary conversational speech recognition
    Billa, J.
    Colhurst, T.
    El-Jaroudi, A.
    Iyer, R.
    Ma, K.
    Matsoukas, S.
    Quillen, C.
    Richardson, F.
    Siu, M.
    Zavaliagkos, G.
    Gish, H.
    ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 1999, 1 : 41 - 44
  • [44] RECENT ADVANCES IN AUTOMATIC SPEECH RECOGNITION
    DEMORI, R
    SIGNAL PROCESSING, 1979, 1 (02) : 95 - 123
  • [45] Separation and deconvolution of speech using recurrent neural networks
    Li, Y
    Powers, D
    Wen, P
    IC-AI'2001: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOLS I-III, 2001, : 1303 - 1309
  • [46] Segmental Recurrent Neural Networks for End-to-end Speech Recognition
    Lu, Liang
    Kong, Lingpeng
    Dyer, Chris
    Smith, Noah A.
    Renals, Steve
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 385 - 389
  • [47] LEARNING ACOUSTIC FRAME LABELING FOR SPEECH RECOGNITION WITH RECURRENT NEURAL NETWORKS
    Sak, Hasim
    Senior, Andrew
    Rao, Kanishka
    Irsoy, Ozan
    Graves, Alex
    Beaufays, Francoise
    Schalkwyk, Johan
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4280 - 4284
  • [48] CHARACTER-LEVEL INCREMENTAL SPEECH RECOGNITION WITH RECURRENT NEURAL NETWORKS
    Hwang, Kyuyeon
    Sung, Wonyong
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5335 - 5339
  • [49] LANGUAGE MODEL BOOTSTRAPPING USING NEURAL MACHINE TRANSLATION FOR CONVERSATIONAL SPEECH RECOGNITION
    Punjabi, Surabhi
    Arsikere, Harish
    Garimella, Sri
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 487 - 493
  • [50] Temporal Feedback Convolutional Recurrent Neural Networks for Speech Command Recognition
    Kim, Taejun
    Nam, Juhan
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 437 - 441