Improving English Conversational Telephone Speech Recognition

被引:13
|
作者
Medennikov, Ivan [1 ,2 ]
Prudnikov, Alexey [2 ,3 ]
Zatvornitskiy, Alexander [1 ,2 ,3 ]
机构
[1] STC Innovat Ltd, St Petersburg, Russia
[2] ITMO Univ, St Petersburg, Russia
[3] Speech Technol Ctr Ltd, St Petersburg, Russia
关键词
conversational telephone speech recognition; deep neural networks; recurrent neural networks;
D O I
10.21437/Interspeech.2016-473
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The goal of this work is to build a state-of-the-art English conversational telephone speech recognition system. We investigated several techniques to improve acoustic modeling, namely speaker-dependent bottleneck features, deep Bidirectional Long Short-Term Memory (BLSTM) recurrent neural networks, data augmentation and score fusion of DNN and BLSTM models. Training set consisted of the 300 hour Switchboard English speech corpus. We also examined the hypothesis rescoring using language models based on recurrent neural networks. The resulting system achieves a word error rate of 7.8% on the Switchboard part of the HUBS 2000 evaluation set which is the competitive result.
引用
收藏
页码:2 / 6
页数:5
相关论文
共 50 条
  • [41] BUILDING COMPETITIVE DIRECT ACOUSTICS-TO-WORD MODELS FOR ENGLISH CONVERSATIONAL SPEECH RECOGNITION
    Audhkhasi, Kartik
    Kingsbury, Brian
    Ramabhadran, Bhuvana
    Saon, George
    Picheny, Michael
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4759 - 4763
  • [42] Improving English Pronunciation via Automatic Speech Recognition Technology
    Li, Meihui
    Han, Meiting
    Chen, Zejia
    Mo, Yiling
    Chen, Xiujuan
    Liu, Xiaobin
    [J]. 2017 INTERNATIONAL SYMPOSIUM ON EDUCATIONAL TECHNOLOGY (ISET 2017), 2017, : 224 - 228
  • [43] The Improving Effect of Intelligent Speech Recognition System on English Learning
    Luo, Qi
    [J]. ADVANCES IN MULTIMEDIA, 2022, 2022
  • [44] Improving English pronunciation via automatic speech recognition technology
    Liu, Xiaobin
    Xu, Manfei
    Li, Meihui
    Han, Meiting
    Chen, Zejia
    Mo, Yiling
    Chen, Xiujuan
    Liu, Minjia
    [J]. INTERNATIONAL JOURNAL OF INNOVATION AND LEARNING, 2019, 25 (02) : 126 - 140
  • [45] VOICEAI SYSTEMS TO NIST SRE19 EVALUATION: ROBUST SPEAKER RECOGNITION ON CONVERSATIONAL TELEPHONE SPEECH
    Li, Rongjin
    Chen, Dongpeng
    Zhang, Weibin
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6459 - 6463
  • [46] Recognition of Interest in Human Conversational Speech
    Schuller, Bjoern
    Koehler, Niels
    Mueller, Ronald
    Rigoll, Gerhard
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 793 - 796
  • [47] Improving Turkish Telephone Speech Recognition with Data Augmentation and Out of Domain Data
    Uslu, Zeynep Gulhan
    Yildirim, Tulay
    [J]. 2019 16TH INTERNATIONAL MULTI-CONFERENCE ON SYSTEMS, SIGNALS & DEVICES (SSD), 2019, : 176 - 179
  • [48] Techniques for Rapid and Robust Topic Identification of Conversational Telephone Speech
    Wintrode, Jonathan
    Kulp, Scott
    [J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1515 - 1518
  • [49] Temporal organization of English clear and conversational speech
    Smiljanic, Rajka
    Bradlow, Ann R.
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2008, 124 (05): : 3171 - 3182
  • [50] Conversational quality evaluation of artificial bandwidth extension of telephone speech
    Pulakka, Hannu
    Laaksonen, Laura
    Yrttiaho, Santeri
    Myllyla, Ville
    Alku, Paavo
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2012, 132 (02): : 848 - 861