Recent advances in conversational speech recognition using conventional and recurrent neural networks

被引:15
|
作者
Saon, G. [1 ]
Picheny, M. [1 ]
机构
[1] IBM Watson Grp, Thomas J Watson Res Ctr, Yorktown Hts, NY 10598 USA
关键词
TRANSCRIPTION; SYSTEM; LSTM;
D O I
10.1147/JRD.2017.2701178
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Deep learning methodologies have had a major impact on performance across a wide variety of machine learning tastes, and speech recognition is no exception. We describe a set of deep learning techniques that proved to be particularly successful in achieving performance gains in word error rate on a popular large vocabulary conversational speech recognition benchmark task ("Switchboard"). We found that the best performance is achieved by combining features from both recurrent and convolutional neural networks. We compare two recurrent architectures: partially unfolded nets with max-out activations and bidirectional long short-term memory nets. In addition, inspired by the success of convolutional networks for image classification, we designed a convolutional net with many convolutional layers and small kernels that create a receptive field with more nonlinearity and fewer parameters than standard configurations. When combined, these neural networks achieve a word error rate of 6.2% on this difficult task; this was the best reported rate at the time of this writing and is even more remarkable given that human performance itself is estimated to be 4% on this data.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Arabic speech recognition using recurrent neural networks
    El Choubassi, MM
    El Khoury, HE
    Alagha, CEJ
    Skaf, JA
    Al-Alaoui, MA
    [J]. PROCEEDINGS OF THE 3RD IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY, 2003, : 543 - 547
  • [2] RECURRENT NEURAL NETWORKS FOR SPEECH RECOGNITION
    VERDEJO, JED
    HERREROS, AP
    LUNA, JCS
    ORTUZAR, MCB
    AYUSO, AR
    [J]. LECTURE NOTES IN COMPUTER SCIENCE, 1991, 540 : 361 - 369
  • [3] Vietnamese Speech Command Recognition using Recurrent Neural Networks
    Phan Duy Hung
    Truong Minh Giang
    Le Hoang Nam
    Phan Minh Duong
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2019, 10 (07) : 194 - 201
  • [4] Speech Emotion Recognition using Convolutional and Recurrent Neural Networks
    Lim, Wootaek
    Jang, Daeyoung
    Lee, Taejin
    [J]. 2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2016,
  • [5] SPEECH RECOGNITION WITH HIERARCHICAL RECURRENT NEURAL NETWORKS
    CHEN, WY
    LIAO, YF
    CHEN, SH
    [J]. PATTERN RECOGNITION, 1995, 28 (06) : 795 - 805
  • [6] Visual speech recognition by recurrent neural networks
    Rabi, G
    Lu, SW
    [J]. JOURNAL OF ELECTRONIC IMAGING, 1998, 7 (01) : 61 - 69
  • [7] Visual speech recognition by recurrent neural networks
    Rabi, G
    Lu, SW
    [J]. 1997 CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING, CONFERENCE PROCEEDINGS, VOLS I AND II: ENGINEERING INNOVATION: VOYAGE OF DISCOVERY, 1997, : 55 - 58
  • [8] SPEECH RECOGNITION WITH DEEP RECURRENT NEURAL NETWORKS
    Graves, Alex
    Mohamed, Abdel-rahman
    Hinton, Geoffrey
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6645 - 6649
  • [9] Unfolded Recurrent Neural Networks for Speech Recognition
    Saon, George
    Soltau, Hagen
    Emami, Ahmad
    Picheny, Michael
    [J]. 15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 343 - 347
  • [10] Speech recognition with hierarchical recurrent neural networks
    Natl Chiao Tung Univ, Hsinchu, Taiwan
    [J]. Pattern Recognit, 6 (795-805):