THE MICROSOFT 2017 CONVERSATIONAL SPEECH RECOGNITION SYSTEM

被引:0
|
作者
Xiong, W. [1 ]
Wu, L. [1 ]
Alleva, F. [1 ]
Droppo, J. [1 ]
Huang, X. [1 ]
Stolcke, A. [1 ]
机构
[1] Microsoft AI & Res, Redmond, WA 98052 USA
关键词
Conversational speech recognition; CNN; LACE; BLSTM; LSTM-LM; system combination; human parity; NEURAL-NETWORKS; BACKPROPAGATION; LSTM; IBM;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We describe the latest version of Microsoft's conversational speech recognition system for the Switchboard and CallHome domains. The system adds a CNN-BLSTM acoustic model to the set of model architectures we combined previously, and includes character-based and dialog session aware LSTM language models in rescoring. For system combination we adopt a two-stage approach, whereby acoustic model posteriors are first combined at the senone/frame level, followed by a word-level voting via confusion networks. We also added another language model rescoring step following the confusion network combination. The resulting system yields a 5.1% word error rate on the NIST 2000 Switchboard test set, and 9.8% on the CallHome subset.
引用
收藏
页码:5934 / 5938
页数:5
相关论文
共 50 条
  • [31] Pronunciation change in conversational speech and its implications for automatic speech recognition
    Saraçlar, M
    Khudanpur, S
    [J]. COMPUTER SPEECH AND LANGUAGE, 2004, 18 (04): : 375 - 395
  • [32] Hierarchical Bayesian Language Models for Conversational Speech Recognition
    Huang, Songfang
    Renals, Steve
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (08): : 1941 - 1954
  • [33] English Conversational Telephone Speech Recognition by Humans and Machines
    Saon, George
    Kurata, Gakuto
    Sercu, Tom
    Audhkhasi, Kartik
    Thomas, Samuel
    Dimitriadis, Dimitrios
    Cui, Xiaodong
    Ramabhadran, Bhuvana
    Picheny, Michael
    Lim, Lynn-Li
    Roomi, Bergul
    Hall, Phil
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 132 - 136
  • [34] Tandem connectionist feature extraction for conversational speech recognition
    Zhu, QF
    Chen, B
    Morgan, N
    Stolcke, A
    [J]. MACHINE LEARNING FOR MULTIMODAL INTERACTION, 2005, 3361 : 223 - 231
  • [35] Recent experiments in Large Vocabulary Conversational Speech Recognition
    Billa, J
    Colhurst, T
    El-Jaroudi, A
    Iyer, R
    Ma, K
    Matsoukas, S
    Quillen, C
    Richardson, F
    Siu, M
    Zavaliagkos, G
    Gish, H
    [J]. ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 41 - 44
  • [36] INVESTIGATING TECHNIQUES FOR LOW RESOURCE CONVERSATIONAL SPEECH RECOGNITION
    Laurent, Antoine
    Fraga-Silva, Thiago
    Lamel, Lori
    Gauvain, Jean-Luc
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5975 - 5979
  • [37] An evaluation of a nonlinear feature transformation for conversational speech recognition
    Omar, MK
    Kingsbury, B
    [J]. 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 785 - 788
  • [38] Combined Speaker Clustering and Role Recognition in Conversational Speech
    Flemotomos, Nikolaos
    Papadopoulos, Pavlos
    Gibson, James
    Narayanan, Shrikanth
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1378 - 1382
  • [39] Directed Speech Separation for Automatic Speech Recognition of Long-form Conversational Speech
    Paturi, Rohit
    Srinivasan, Sundararajan
    Kirchhoff, Katrin
    Romero, Daniel Garcia
    [J]. INTERSPEECH 2022, 2022, : 5388 - 5392
  • [40] Recent experiments in large vocabulary conversational speech recognition
    Billa, J.
    Colhurst, T.
    El-Jaroudi, A.
    Iyer, R.
    Ma, K.
    Matsoukas, S.
    Quillen, C.
    Richardson, F.
    Siu, M.
    Zavaliagkos, G.
    Gish, H.
    [J]. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 1999, 1 : 41 - 44