Recent advances in conversational speech recognition using conventional and recurrent neural networks

被引：15

作者：

Saon, G. ^{[1
]}

Picheny, M. ^{[1
]}

机构：

[1] IBM Watson Grp, Thomas J Watson Res Ctr, Yorktown Hts, NY 10598 USA

来源：

IBM JOURNAL OF RESEARCH AND DEVELOPMENT | 2017年 / 61卷 / 4-5期

关键词：

TRANSCRIPTION; SYSTEM; LSTM;

D O I：

10.1147/JRD.2017.2701178

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Deep learning methodologies have had a major impact on performance across a wide variety of machine learning tastes, and speech recognition is no exception. We describe a set of deep learning techniques that proved to be particularly successful in achieving performance gains in word error rate on a popular large vocabulary conversational speech recognition benchmark task ("Switchboard"). We found that the best performance is achieved by combining features from both recurrent and convolutional neural networks. We compare two recurrent architectures: partially unfolded nets with max-out activations and bidirectional long short-term memory nets. In addition, inspired by the success of convolutional networks for image classification, we designed a convolutional net with many convolutional layers and small kernels that create a receptive field with more nonlinearity and fewer parameters than standard configurations. When combined, these neural networks achieve a word error rate of 6.2% on this difficult task; this was the best reported rate at the time of this writing and is even more remarkable given that human performance itself is estimated to be 4% on this data.

引用

页数：10

共 50 条

[1] Arabic speech recognition using recurrent neural networks
El Choubassi, MM
El Khoury, HE
Alagha, CEJ
Skaf, JA
Al-Alaoui, MA
[J]. PROCEEDINGS OF THE 3RD IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY, 2003, : 543 - 547
[2] RECURRENT NEURAL NETWORKS FOR SPEECH RECOGNITION
VERDEJO, JED
HERREROS, AP
LUNA, JCS
ORTUZAR, MCB
AYUSO, AR
[J]. LECTURE NOTES IN COMPUTER SCIENCE, 1991, 540 : 361 - 369
[3] Vietnamese Speech Command Recognition using Recurrent Neural Networks
Phan Duy Hung
Truong Minh Giang
Le Hoang Nam
Phan Minh Duong
[J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2019, 10 (07) : 194 - 201
[4] Speech Emotion Recognition using Convolutional and Recurrent Neural Networks
Lim, Wootaek
Jang, Daeyoung
Lee, Taejin
[J]. 2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2016,
[5] SPEECH RECOGNITION WITH HIERARCHICAL RECURRENT NEURAL NETWORKS
CHEN, WY
LIAO, YF
CHEN, SH
[J]. PATTERN RECOGNITION, 1995, 28 (06) : 795 - 805
[6] Visual speech recognition by recurrent neural networks
Rabi, G
Lu, SW
[J]. JOURNAL OF ELECTRONIC IMAGING, 1998, 7 (01) : 61 - 69
[7] Visual speech recognition by recurrent neural networks
Rabi, G
Lu, SW
[J]. 1997 CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING, CONFERENCE PROCEEDINGS, VOLS I AND II: ENGINEERING INNOVATION: VOYAGE OF DISCOVERY, 1997, : 55 - 58
[8] SPEECH RECOGNITION WITH DEEP RECURRENT NEURAL NETWORKS
Graves, Alex
Mohamed, Abdel-rahman
Hinton, Geoffrey
[J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6645 - 6649
[9] Unfolded Recurrent Neural Networks for Speech Recognition
Saon, George
Soltau, Hagen
Emami, Ahmad
Picheny, Michael
[J]. 15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 343 - 347
[10] Speech recognition with hierarchical recurrent neural networks
Natl Chiao Tung Univ, Hsinchu, Taiwan
[J]. Pattern Recognit, 6 (795-805):

← 1 2 3 4 5 →