THE MICROSOFT 2017 CONVERSATIONAL SPEECH RECOGNITION SYSTEM

被引:0
|
作者
Xiong, W. [1 ]
Wu, L. [1 ]
Alleva, F. [1 ]
Droppo, J. [1 ]
Huang, X. [1 ]
Stolcke, A. [1 ]
机构
[1] Microsoft AI & Res, Redmond, WA 98052 USA
关键词
Conversational speech recognition; CNN; LACE; BLSTM; LSTM-LM; system combination; human parity; NEURAL-NETWORKS; BACKPROPAGATION; LSTM; IBM;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We describe the latest version of Microsoft's conversational speech recognition system for the Switchboard and CallHome domains. The system adds a CNN-BLSTM acoustic model to the set of model architectures we combined previously, and includes character-based and dialog session aware LSTM language models in rescoring. For system combination we adopt a two-stage approach, whereby acoustic model posteriors are first combined at the senone/frame level, followed by a word-level voting via confusion networks. We also added another language model rescoring step following the confusion network combination. The resulting system yields a 5.1% word error rate on the NIST 2000 Switchboard test set, and 9.8% on the CallHome subset.
引用
收藏
页码:5934 / 5938
页数:5
相关论文
共 50 条
  • [21] Improved Single System Conversational Telephone Speech Recognition with VGG Bottleneck Features
    Hartmann, William
    Hsiao, Roger
    Ng, Tim
    Ma, Jeff
    Keith, Francis
    Siu, Man-Hung
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 112 - 116
  • [22] Densely Connected Networks for Conversational Speech Recognition
    Han, Kyu J.
    Chandrashekaran, Akshay
    Kim, Jungsuk
    Lane, Ian
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 796 - 800
  • [23] Toward Human Parity in Conversational Speech Recognition
    Xiong, Wayne
    Droppo, Jasha
    Huang, Xuedong
    Seide, Frank
    Seltzer, Michael L.
    Stolcke, Andreas
    Yu, Dong
    Zweig, Geoffrey
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (12) : 2410 - 2423
  • [24] ROLE ANNOTATED SPEECH RECOGNITION FOR CONVERSATIONAL INTERACTIONS
    Flemotomos, Nikolaos
    Chen, Zhuohao
    Atkins, David C.
    Narayanan, Shrikanth
    [J]. 2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 1036 - 1043
  • [25] Improving a Conversational Speech Recognition System Using Phonetic and Neural Transcript Correction
    Campos-Soberanis, Mario
    Campos-Sobrino, Diego
    Viana-Cámara, Rafael
    [J]. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2021, 13068 LNAI : 46 - 58
  • [26] Improving a Conversational Speech Recognition System Using Phonetic and Neural Transcript Correction
    Campos-Soberanis, Mario
    Campos-Sobrino, Diego
    Viana-Camara, Rafael
    [J]. ADVANCES IN SOFT COMPUTING (MICAI 2021), PT II, 2021, 13068 : 46 - 58
  • [27] Progress on Mandarin conversational telephone speech recognition
    Hwang, MY
    Lei, X
    Ng, T
    Bulyko, I
    Ostendorf, M
    Stolcke, A
    Wang, W
    Zheng, J
    Gadde, VRR
    Graciarena, M
    Siu, MH
    Huang, Y
    [J]. 2004 International Symposium on Chinese Spoken Language Processing, Proceedings, 2004, : 1 - 4
  • [28] Attention Shift Decoding for Conversational Speech Recognition
    Kumaran, Raghunandan
    Bilmes, Jeff
    Kirchhoff, Katrin
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2908 - 2911
  • [29] ENHANCEMENT AND ANALYSIS OF CONVERSATIONAL SPEECH: JS']JSALT 2017
    Ryant, Neville
    Bergelson, Elika
    Church, Kenneth
    Cristia, Alejandrina
    Du, Jun
    Ganapathy, Sriram
    Khudanpur, Sanjeev
    Kowalski, Diana
    Krishnamoorthy, Mahesh
    Kulshreshta, Rajat
    Liberman, Mark
    Lu, Yu-Ding
    Maciejewski, Matthew
    Metze, Florian
    Profant, Jan
    Sun, Lei
    Tsao, Yu
    Yu, Zhou
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5154 - 5158
  • [30] Generating and evaluating segmentations for automatic speech recognition of conversational telephone speech
    Tranter, SE
    Yu, K
    Evermann, G
    Woodland, RC
    [J]. 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 753 - 756