Improving Acoustic Models for Russian Spontaneous Speech Recognition

被引:10
|
作者
Prudnikov, Alexey [1 ,2 ]
Medennikov, Ivan [2 ,3 ]
Mendelev, Valentin [1 ]
Korenevsky, Maxim [1 ,2 ]
Khokhlov, Yuri [3 ]
机构
[1] Speech Technol Ctr Ltd, St Petersburg, Russia
[2] ITMO Univ, St Petersburg, Russia
[3] STC Innovat Ltd, St Petersburg, Russia
来源
关键词
Speech recognition; Russian spontaneous speech; Deep neural networks; Speaker adaptation; I-vectors; Bottleneck features; ADAPTATION;
D O I
10.1007/978-3-319-23132-7_29
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The aim of the paper is to investigate the ways to improve acoustic models for Russian spontaneous speech recognition. We applied the main steps of the Kaldi Switchboard recipe to a Russian dataset but obtained low accuracy with respect to the results for English spontaneous telephone speech. We found two methods to be especially useful for Russian spontaneous speech: the i-vector based deep neural network adaptation and speaker-dependent bottleneck features which provide 8.6% and 11.9% relative word error rate reduction over the baseline system respectively.
引用
收藏
页码:234 / 242
页数:9
相关论文
共 50 条
  • [41] Decision tree-based acoustic models for speech recognition
    Masami Akamine
    Jitendra Ajmera
    EURASIP Journal on Audio, Speech, and Music Processing, 2012
  • [42] Multilingual acoustic models for the recognition of non-native speech
    Fischer, V
    Janke, E
    Kunzmann, S
    Ross, T
    ASRU 2001: IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, CONFERENCE PROCEEDINGS, 2001, : 331 - 334
  • [43] Context-dependent acoustic models for Chinese speech recognition
    Ma, B
    Huang, TY
    Xu, B
    Zhang, XJ
    Qu, F
    1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 455 - 458
  • [44] Boosting HMM acoustic models in large vocabulary speech recognition
    Meyer, C
    Schramm, H
    SPEECH COMMUNICATION, 2006, 48 (05) : 532 - 548
  • [45] Decision tree-based acoustic models for speech recognition
    Akamine, Masami
    Ajmera, Jitendra
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2012,
  • [46] Context-independent acoustic models for Thai speech recognition
    Kasuriya, S
    Kanokphara, S
    Thatphithakkul, N
    Cotsomrong, P
    Sunpethniyom, T
    IEEE INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS AND INFORMATION TECHNOLOGIES 2004 (ISCIT 2004), PROCEEDINGS, VOLS 1 AND 2: SMART INFO-MEDIA SYSTEMS, 2004, : 991 - 994
  • [47] A Study on the Generalization Capability of Acoustic Models for Robust Speech Recognition
    Xiao, Xiong
    Li, Jinyu
    Chng, Eng Siong
    Li, Haizhou
    Lee, Chin-Hui
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (06): : 1158 - 1169
  • [48] Acoustic Modelling for Speech Recognition: Hidden Markov Models and Beyond?
    Gales, M. J. F.
    2009 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION & UNDERSTANDING (ASRU 2009), 2009, : 44 - 44
  • [49] Building DNN acoustic models for large vocabulary speech recognition
    Maas, Andrew L.
    Qi, Peng
    Xie, Ziang
    Hannun, Awni Y.
    Lengerich, Christopher T.
    Jurafsky, Daniel
    Ng, Andrew Y.
    COMPUTER SPEECH AND LANGUAGE, 2017, 41 : 195 - 213
  • [50] Improving speech recognition using data augmentation and acoustic model fusion
    Rebai, Ilyes
    BenAyed, Yessine
    Mahdi, Walid
    Lorre, Jean-Pierre
    KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS, 2017, 112 : 316 - 322