Improving Acoustic Models for Russian Spontaneous Speech Recognition

被引：10

作者：

Prudnikov, Alexey ^{[1
,2
]}

Medennikov, Ivan ^{[2
,3
]}

Mendelev, Valentin ^{[1
]}

Korenevsky, Maxim ^{[1
,2
]}

Khokhlov, Yuri ^{[3
]}

机构：

[1] Speech Technol Ctr Ltd, St Petersburg, Russia

[2] ITMO Univ, St Petersburg, Russia

[3] STC Innovat Ltd, St Petersburg, Russia

来源：

SPEECH AND COMPUTER (SPECOM 2015) | 2015年 / 9319卷

关键词：

Speech recognition; Russian spontaneous speech; Deep neural networks; Speaker adaptation; I-vectors; Bottleneck features; ADAPTATION;

D O I：

10.1007/978-3-319-23132-7_29

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The aim of the paper is to investigate the ways to improve acoustic models for Russian spontaneous speech recognition. We applied the main steps of the Kaldi Switchboard recipe to a Russian dataset but obtained low accuracy with respect to the results for English spontaneous telephone speech. We found two methods to be especially useful for Russian spontaneous speech: the i-vector based deep neural network adaptation and speaker-dependent bottleneck features which provide 8.6% and 11.9% relative word error rate reduction over the baseline system respectively.

引用

页码：234 / 242

页数：9

共 50 条

[41] Decision tree-based acoustic models for speech recognition
Masami Akamine
Jitendra Ajmera
EURASIP Journal on Audio, Speech, and Music Processing, 2012
[42] Multilingual acoustic models for the recognition of non-native speech
Fischer, V
Janke, E
Kunzmann, S
Ross, T
ASRU 2001: IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, CONFERENCE PROCEEDINGS, 2001, : 331 - 334
[43] Context-dependent acoustic models for Chinese speech recognition
Ma, B
Huang, TY
Xu, B
Zhang, XJ
Qu, F
1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 455 - 458
[44] Boosting HMM acoustic models in large vocabulary speech recognition
Meyer, C
Schramm, H
SPEECH COMMUNICATION, 2006, 48 (05) : 532 - 548
[45] Decision tree-based acoustic models for speech recognition
Akamine, Masami
Ajmera, Jitendra
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2012,
[46] Context-independent acoustic models for Thai speech recognition
Kasuriya, S
Kanokphara, S
Thatphithakkul, N
Cotsomrong, P
Sunpethniyom, T
IEEE INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS AND INFORMATION TECHNOLOGIES 2004 (ISCIT 2004), PROCEEDINGS, VOLS 1 AND 2: SMART INFO-MEDIA SYSTEMS, 2004, : 991 - 994
[47] A Study on the Generalization Capability of Acoustic Models for Robust Speech Recognition
Xiao, Xiong
Li, Jinyu
Chng, Eng Siong
Li, Haizhou
Lee, Chin-Hui
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (06): : 1158 - 1169
[48] Acoustic Modelling for Speech Recognition: Hidden Markov Models and Beyond?
Gales, M. J. F.
2009 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION & UNDERSTANDING (ASRU 2009), 2009, : 44 - 44
[49] Building DNN acoustic models for large vocabulary speech recognition
Maas, Andrew L.
Qi, Peng
Xie, Ziang
Hannun, Awni Y.
Lengerich, Christopher T.
Jurafsky, Daniel
Ng, Andrew Y.
COMPUTER SPEECH AND LANGUAGE, 2017, 41 : 195 - 213
[50] Improving speech recognition using data augmentation and acoustic model fusion
Rebai, Ilyes
BenAyed, Yessine
Mahdi, Walid
Lorre, Jean-Pierre
KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS, 2017, 112 : 316 - 322

← 1 2 3 4 5 →