THE MICROSOFT 2017 CONVERSATIONAL SPEECH RECOGNITION SYSTEM

被引：0

作者：

Xiong, W. ^{[1
]}

Wu, L. ^{[1
]}

Alleva, F. ^{[1
]}

Droppo, J. ^{[1
]}

Huang, X. ^{[1
]}

Stolcke, A. ^{[1
]}

机构：

[1] Microsoft AI & Res, Redmond, WA 98052 USA

来源：

2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2018年

关键词：

Conversational speech recognition; CNN; LACE; BLSTM; LSTM-LM; system combination; human parity; NEURAL-NETWORKS; BACKPROPAGATION; LSTM; IBM;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

We describe the latest version of Microsoft's conversational speech recognition system for the Switchboard and CallHome domains. The system adds a CNN-BLSTM acoustic model to the set of model architectures we combined previously, and includes character-based and dialog session aware LSTM language models in rescoring. For system combination we adopt a two-stage approach, whereby acoustic model posteriors are first combined at the senone/frame level, followed by a word-level voting via confusion networks. We also added another language model rescoring step following the confusion network combination. The resulting system yields a 5.1% word error rate on the NIST 2000 Switchboard test set, and 9.8% on the CallHome subset.

引用

页码：5934 / 5938

页数：5

共 50 条

[21] Improved Single System Conversational Telephone Speech Recognition with VGG Bottleneck Features
Hartmann, William
Hsiao, Roger
Ng, Tim
Ma, Jeff
Keith, Francis
Siu, Man-Hung
[J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 112 - 116
[22] Densely Connected Networks for Conversational Speech Recognition
Han, Kyu J.
Chandrashekaran, Akshay
Kim, Jungsuk
Lane, Ian
[J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 796 - 800
[23] Toward Human Parity in Conversational Speech Recognition
Xiong, Wayne
Droppo, Jasha
Huang, Xuedong
Seide, Frank
Seltzer, Michael L.
Stolcke, Andreas
Yu, Dong
Zweig, Geoffrey
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (12) : 2410 - 2423
[24] ROLE ANNOTATED SPEECH RECOGNITION FOR CONVERSATIONAL INTERACTIONS
Flemotomos, Nikolaos
Chen, Zhuohao
Atkins, David C.
Narayanan, Shrikanth
[J]. 2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 1036 - 1043
[25] Improving a Conversational Speech Recognition System Using Phonetic and Neural Transcript Correction
Campos-Soberanis, Mario
Campos-Sobrino, Diego
Viana-Cámara, Rafael
[J]. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2021, 13068 LNAI : 46 - 58
[26] Improving a Conversational Speech Recognition System Using Phonetic and Neural Transcript Correction
Campos-Soberanis, Mario
Campos-Sobrino, Diego
Viana-Camara, Rafael
[J]. ADVANCES IN SOFT COMPUTING (MICAI 2021), PT II, 2021, 13068 : 46 - 58
[27] Progress on Mandarin conversational telephone speech recognition
Hwang, MY
Lei, X
Ng, T
Bulyko, I
Ostendorf, M
Stolcke, A
Wang, W
Zheng, J
Gadde, VRR
Graciarena, M
Siu, MH
Huang, Y
[J]. 2004 International Symposium on Chinese Spoken Language Processing, Proceedings, 2004, : 1 - 4
[28] Attention Shift Decoding for Conversational Speech Recognition
Kumaran, Raghunandan
Bilmes, Jeff
Kirchhoff, Katrin
[J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2908 - 2911
[29] ENHANCEMENT AND ANALYSIS OF CONVERSATIONAL SPEECH: JS']JSALT 2017
Ryant, Neville
Bergelson, Elika
Church, Kenneth
Cristia, Alejandrina
Du, Jun
Ganapathy, Sriram
Khudanpur, Sanjeev
Kowalski, Diana
Krishnamoorthy, Mahesh
Kulshreshta, Rajat
Liberman, Mark
Lu, Yu-Ding
Maciejewski, Matthew
Metze, Florian
Profant, Jan
Sun, Lei
Tsao, Yu
Yu, Zhou
[J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5154 - 5158
[30] Generating and evaluating segmentations for automatic speech recognition of conversational telephone speech
Tranter, SE
Yu, K
Evermann, G
Woodland, RC
[J]. 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 753 - 756

← 1 2 3 4 5 →