Improving English Conversational Telephone Speech Recognition

被引:13
|
作者
Medennikov, Ivan [1 ,2 ]
Prudnikov, Alexey [2 ,3 ]
Zatvornitskiy, Alexander [1 ,2 ,3 ]
机构
[1] STC Innovat Ltd, St Petersburg, Russia
[2] ITMO Univ, St Petersburg, Russia
[3] Speech Technol Ctr Ltd, St Petersburg, Russia
关键词
conversational telephone speech recognition; deep neural networks; recurrent neural networks;
D O I
10.21437/Interspeech.2016-473
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The goal of this work is to build a state-of-the-art English conversational telephone speech recognition system. We investigated several techniques to improve acoustic modeling, namely speaker-dependent bottleneck features, deep Bidirectional Long Short-Term Memory (BLSTM) recurrent neural networks, data augmentation and score fusion of DNN and BLSTM models. Training set consisted of the 300 hour Switchboard English speech corpus. We also examined the hypothesis rescoring using language models based on recurrent neural networks. The resulting system achieves a word error rate of 7.8% on the Switchboard part of the HUBS 2000 evaluation set which is the competitive result.
引用
收藏
页码:2 / 6
页数:5
相关论文
共 50 条
  • [1] English Conversational Telephone Speech Recognition by Humans and Machines
    Saon, George
    Kurata, Gakuto
    Sercu, Tom
    Audhkhasi, Kartik
    Thomas, Samuel
    Dimitriadis, Dimitrios
    Cui, Xiaodong
    Ramabhadran, Bhuvana
    Picheny, Michael
    Lim, Lynn-Li
    Roomi, Bergul
    Hall, Phil
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 132 - 136
  • [2] The IBM 2016 English Conversational Telephone Speech Recognition System
    Saon, George
    Sercu, Tom
    Rennie, Steven
    Kuo, Hong-Kwang J.
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 7 - 11
  • [3] The IBM 2015 English Conversational Telephone Speech Recognition System
    Saon, George
    Kuo, Hong-Kwang J.
    Rennie, Steven
    Picheny, Michael
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3140 - 3144
  • [4] Conversational telephone speech recognition
    Gauvain, JL
    Lamel, L
    Schwenk, H
    Adda, G
    Chen, L
    Lefèvre, F
    [J]. 2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 212 - 215
  • [5] Improvements in recognition of conversational telephone speech
    Peskin, B
    Newman, M
    McAllaster, D
    Nagesha, V
    Richards, H
    Wegmann, S
    Hunt, M
    Gillick, L
    [J]. ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 53 - 56
  • [6] Conversational telephone speech recognition for Lithuanian
    Lileiyte, Rasa
    Lamel, Lori
    Guvain, Jean-Luc
    Gorin, Arseniy
    [J]. COMPUTER SPEECH AND LANGUAGE, 2018, 49 : 71 - 82
  • [7] Progress on Mandarin conversational telephone speech recognition
    Hwang, MY
    Lei, X
    Ng, T
    Bulyko, I
    Ostendorf, M
    Stolcke, A
    Wang, W
    Zheng, J
    Gadde, VRR
    Graciarena, M
    Siu, MH
    Huang, Y
    [J]. 2004 International Symposium on Chinese Spoken Language Processing, Proceedings, 2004, : 1 - 4
  • [8] Noise-Robust speech recognition of Conversational Telephone Speech
    Chen, Gang
    Tolba, Hesham
    O'Shaughnessy, Douglas
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1101 - 1104
  • [9] Recognition of conversational telephone speech using the JANUS speech engine
    Zeppenfeld, T
    Finke, M
    Ries, K
    Westphal, M
    Waibel, A
    [J]. 1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 1815 - 1818
  • [10] On the limit of English conversational speech recognition
    Tuske, Zoltan
    Saon, George
    Kingsbury, Brian
    [J]. INTERSPEECH 2021, 2021, : 2062 - 2066