THE MICROSOFT 2017 CONVERSATIONAL SPEECH RECOGNITION SYSTEM

被引:0
|
作者
Xiong, W. [1 ]
Wu, L. [1 ]
Alleva, F. [1 ]
Droppo, J. [1 ]
Huang, X. [1 ]
Stolcke, A. [1 ]
机构
[1] Microsoft AI & Res, Redmond, WA 98052 USA
关键词
Conversational speech recognition; CNN; LACE; BLSTM; LSTM-LM; system combination; human parity; NEURAL-NETWORKS; BACKPROPAGATION; LSTM; IBM;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We describe the latest version of Microsoft's conversational speech recognition system for the Switchboard and CallHome domains. The system adds a CNN-BLSTM acoustic model to the set of model architectures we combined previously, and includes character-based and dialog session aware LSTM language models in rescoring. For system combination we adopt a two-stage approach, whereby acoustic model posteriors are first combined at the senone/frame level, followed by a word-level voting via confusion networks. We also added another language model rescoring step following the confusion network combination. The resulting system yields a 5.1% word error rate on the NIST 2000 Switchboard test set, and 9.8% on the CallHome subset.
引用
收藏
页码:5934 / 5938
页数:5
相关论文
共 50 条
  • [1] THE MICROSOFT 2016 CONVERSATIONAL SPEECH RECOGNITION SYSTEM
    Xiong, W.
    Droppo, J.
    Huang, X.
    Seide, F.
    Seltzer, M.
    Stolcke, A.
    Yu, D.
    Zweig, G.
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5255 - 5259
  • [2] ACOUSTIC PROCESSOR IN A CONVERSATIONAL SPEECH RECOGNITION SYSTEM
    NAKATSU, R
    KOHDA, M
    [J]. REVIEW OF THE ELECTRICAL COMMUNICATIONS LABORATORIES, 1978, 26 (11-1): : 1486 - 1504
  • [3] LINGUISTIC PROCESSOR IN A CONVERSATIONAL SPEECH RECOGNITION SYSTEM
    SHIKANO, K
    KOHDA, M
    [J]. REVIEW OF THE ELECTRICAL COMMUNICATIONS LABORATORIES, 1978, 26 (11-1): : 1505 - 1520
  • [4] BUT OpenSAT 2017 speech recognition system
    Karafiat, Martin
    Baskar, Murali Karthick
    Szoke, Igor
    Malenovsky, Vladimir
    Vesely, Karel
    Grezl, Frantisek
    Burget, Lukas
    Cernocky, Jan Honza
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2638 - 2642
  • [5] Conversational telephone speech recognition
    Gauvain, JL
    Lamel, L
    Schwenk, H
    Adda, G
    Chen, L
    Lefèvre, F
    [J]. 2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 212 - 215
  • [6] The IBM 2016 English Conversational Telephone Speech Recognition System
    Saon, George
    Sercu, Tom
    Rennie, Steven
    Kuo, Hong-Kwang J.
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 7 - 11
  • [7] The IBM 2015 English Conversational Telephone Speech Recognition System
    Saon, George
    Kuo, Hong-Kwang J.
    Rennie, Steven
    Picheny, Michael
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3140 - 3144
  • [8] Recognition of Interest in Human Conversational Speech
    Schuller, Bjoern
    Koehler, Niels
    Mueller, Ronald
    Rigoll, Gerhard
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 793 - 796
  • [9] Conversational telephone speech recognition for Lithuanian
    Lileiyte, Rasa
    Lamel, Lori
    Guvain, Jean-Luc
    Gorin, Arseniy
    [J]. COMPUTER SPEECH AND LANGUAGE, 2018, 49 : 71 - 82
  • [10] Improvements in recognition of conversational telephone speech
    Peskin, B
    Newman, M
    McAllaster, D
    Nagesha, V
    Richards, H
    Wegmann, S
    Hunt, M
    Gillick, L
    [J]. ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 53 - 56