Improving Acoustic Models for Russian Spontaneous Speech Recognition

被引:10
|
作者
Prudnikov, Alexey [1 ,2 ]
Medennikov, Ivan [2 ,3 ]
Mendelev, Valentin [1 ]
Korenevsky, Maxim [1 ,2 ]
Khokhlov, Yuri [3 ]
机构
[1] Speech Technol Ctr Ltd, St Petersburg, Russia
[2] ITMO Univ, St Petersburg, Russia
[3] STC Innovat Ltd, St Petersburg, Russia
来源
关键词
Speech recognition; Russian spontaneous speech; Deep neural networks; Speaker adaptation; I-vectors; Bottleneck features; ADAPTATION;
D O I
10.1007/978-3-319-23132-7_29
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The aim of the paper is to investigate the ways to improve acoustic models for Russian spontaneous speech recognition. We applied the main steps of the Kaldi Switchboard recipe to a Russian dataset but obtained low accuracy with respect to the results for English spontaneous telephone speech. We found two methods to be especially useful for Russian spontaneous speech: the i-vector based deep neural network adaptation and speaker-dependent bottleneck features which provide 8.6% and 11.9% relative word error rate reduction over the baseline system respectively.
引用
收藏
页码:234 / 242
页数:9
相关论文
共 50 条
  • [1] Acoustic and Language Models Adaptation for Indonesian Spontaneous Speech Recognition
    Lestari, Dessi Puji
    Irfani, Angela
    2015 2ND INTERNATIONAL CONFERENCE ON ADVANCED INFORMATICS: CONCEPTS, THEORY AND APPLICATIONS ICAICTA, 2015,
  • [2] Experimenting with Hybrid TDNN/HMM Acoustic Models for Russian Speech Recognition
    Kipyatkova, Irina
    SPEECH AND COMPUTER, SPECOM 2017, 2017, 10458 : 362 - 369
  • [3] Towards Robust Indonesian Speech Recognition with Spontaneous-Speech Adapted Acoustic Models
    Hoesen, Devin
    Satriawan, Cil Hardianto
    Lestari, Dessi Puji
    Khodra, Masayu Leylia
    SLTU-2016 5TH WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGIES FOR UNDER-RESOURCED LANGUAGES, 2016, 81 : 167 - 173
  • [4] Specific acoustic models for spontaneous and dictated style in indonesian speech recognition
    Vista, C. B.
    Satriawan, C. H.
    Lestari, D. P.
    Widyantoro, D. H.
    2ND INTERNATIONAL CONFERENCE ON COMPUTING AND APPLIED INFORMATICS 2017, 2018, 978
  • [5] IMPROVING LATENCY-CONTROLLED BLSTM ACOUSTIC MODELS FOR ONLINE SPEECH RECOGNITION
    Xue, Shaofei
    Yan, Zhijie
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5340 - 5344
  • [6] Interpolation of Acoustic Models for Speech Recognition
    Fraga-Silva, Thiago
    Gauvain, Jean-Luc
    Lamel, Lori
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 3346 - 3350
  • [7] Advances in STC Russian Spontaneous Speech Recognition System
    Medennikov, Ivan
    Prudnikov, Alexey
    SPEECH AND COMPUTER, 2016, 9811 : 116 - 123
  • [8] Improving Acoustic Models for Dysarthric Speech Recognition using Time Delay Neural Networks
    Misbullah, Alim
    Lin, Hai-Hsing
    Chang, Chia-Yuan
    Yeh, Hsiu-Wei
    Weng, Ko-Cheng
    2020 INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING AND INFORMATICS (ICELTICS 2020), 2020, : 118 - 121
  • [9] Improving Discriminative Training for Robust Acoustic Models in Large Vocabulary Continuous Speech Recognition
    Pylkkonen, Janne
    Kurimo, Mikko
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1210 - 1213
  • [10] Improving Speech Recognition through Automatic Selection of Age Group - Specific Acoustic Models
    Haemaelaeinen, Annika
    Meinedo, Hugo
    Tjalve, Michael
    Pellegrini, Thomas
    Trancoso, Isabel
    Dias, Miguel Sales
    COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, 2014, 8775 : 12 - 23