THE MICROSOFT 2017 CONVERSATIONAL SPEECH RECOGNITION SYSTEM

被引：0

作者：

Xiong, W. ^{[1
]}

Wu, L. ^{[1
]}

Alleva, F. ^{[1
]}

Droppo, J. ^{[1
]}

Huang, X. ^{[1
]}

Stolcke, A. ^{[1
]}

机构：

[1] Microsoft AI & Res, Redmond, WA 98052 USA

来源：

2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2018年

关键词：

Conversational speech recognition; CNN; LACE; BLSTM; LSTM-LM; system combination; human parity; NEURAL-NETWORKS; BACKPROPAGATION; LSTM; IBM;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

We describe the latest version of Microsoft's conversational speech recognition system for the Switchboard and CallHome domains. The system adds a CNN-BLSTM acoustic model to the set of model architectures we combined previously, and includes character-based and dialog session aware LSTM language models in rescoring. For system combination we adopt a two-stage approach, whereby acoustic model posteriors are first combined at the senone/frame level, followed by a word-level voting via confusion networks. We also added another language model rescoring step following the confusion network combination. The resulting system yields a 5.1% word error rate on the NIST 2000 Switchboard test set, and 9.8% on the CallHome subset.

引用

页码：5934 / 5938

页数：5

共 50 条

[31] Pronunciation change in conversational speech and its implications for automatic speech recognition
Saraçlar, M
Khudanpur, S
[J]. COMPUTER SPEECH AND LANGUAGE, 2004, 18 (04): : 375 - 395
[32] Hierarchical Bayesian Language Models for Conversational Speech Recognition
Huang, Songfang
Renals, Steve
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (08): : 1941 - 1954
[33] English Conversational Telephone Speech Recognition by Humans and Machines
Saon, George
Kurata, Gakuto
Sercu, Tom
Audhkhasi, Kartik
Thomas, Samuel
Dimitriadis, Dimitrios
Cui, Xiaodong
Ramabhadran, Bhuvana
Picheny, Michael
Lim, Lynn-Li
Roomi, Bergul
Hall, Phil
[J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 132 - 136
[34] Tandem connectionist feature extraction for conversational speech recognition
Zhu, QF
Chen, B
Morgan, N
Stolcke, A
[J]. MACHINE LEARNING FOR MULTIMODAL INTERACTION, 2005, 3361 : 223 - 231
[35] Recent experiments in Large Vocabulary Conversational Speech Recognition
Billa, J
Colhurst, T
El-Jaroudi, A
Iyer, R
Ma, K
Matsoukas, S
Quillen, C
Richardson, F
Siu, M
Zavaliagkos, G
Gish, H
[J]. ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 41 - 44
[36] INVESTIGATING TECHNIQUES FOR LOW RESOURCE CONVERSATIONAL SPEECH RECOGNITION
Laurent, Antoine
Fraga-Silva, Thiago
Lamel, Lori
Gauvain, Jean-Luc
[J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5975 - 5979
[37] An evaluation of a nonlinear feature transformation for conversational speech recognition
Omar, MK
Kingsbury, B
[J]. 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 785 - 788
[38] Combined Speaker Clustering and Role Recognition in Conversational Speech
Flemotomos, Nikolaos
Papadopoulos, Pavlos
Gibson, James
Narayanan, Shrikanth
[J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1378 - 1382
[39] Directed Speech Separation for Automatic Speech Recognition of Long-form Conversational Speech
Paturi, Rohit
Srinivasan, Sundararajan
Kirchhoff, Katrin
Romero, Daniel Garcia
[J]. INTERSPEECH 2022, 2022, : 5388 - 5392
[40] Recent experiments in large vocabulary conversational speech recognition
Billa, J.
Colhurst, T.
El-Jaroudi, A.
Iyer, R.
Ma, K.
Matsoukas, S.
Quillen, C.
Richardson, F.
Siu, M.
Zavaliagkos, G.
Gish, H.
[J]. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 1999, 1 : 41 - 44

← 1 2 3 4 5 →