End-to-End Large Vocabulary Speech Recognition for the Serbian Language

被引：6

作者：

Popovic, Branislav ^{[1
,2
]}

Pakoci, Edvin ^{[1
,2
]}

Pekar, Darko ^{[1
,2
]}

机构：

[1] Univ Novi Sad, Dept Power Elect & Telecommun Engn, Fac Tech Sci, Trg Dositeja Obradovica 6, Novi Sad 21000, Serbia

[2] AlfaNum Speech Technol, Bulevar Vojvode Stepe 40, Novi Sad 21000, Serbia

来源：

SPEECH AND COMPUTER, SPECOM 2017 | 2017年 / 10458卷

关键词：

Eesen; End-to-end; LSTM; Speech recognition; Serbian;

D O I：

10.1007/978-3-319-66429-3_33

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper presents the results of a large vocabulary speech recognition for the Serbian language, developed by using Eesen end-to-end framework. Eesen involves training a single deep recurrent neural network, containing a number of bidirectional long short-term memory layers, modeling the connection between the speech and a set of context-independent lexicon units. This approach reduces the amount of expert knowledge needed in order to develop other competitive speech recognition systems. The training is based on a connectionist temporal classification, while decoding allows the usage of weighted finite-state transducers. This provides much faster and more efficient decoding in comparison to other similar systems. A corpus of approximately 215 h of audio data (about 171 h of speech and 44 h of silence, or 243 male and 239 female speakers) was employed for the training (about 90%) and testing (about 10%) purposes. On a set of more than 120000 words, the word error rate of 14.68% and the character error rate of 3.68% is achieved.

引用

页码：343 / 352

页数：10

共 50 条

[21] Multichannel End-to-end Speech Recognition
Ochiai, Tsubasa
Watanabe, Shinji
Hori, Takaaki
Hershey, John R.
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
[22] End-to-end Accented Speech Recognition
Viglino, Thibault
Motlicek, Petr
Cernak, Milos
[J]. INTERSPEECH 2019, 2019, : 2140 - 2144
[23] END-TO-END AUDIOVISUAL SPEECH RECOGNITION
Petridis, Stavros
Stafylakis, Themos
Ma, Pingchuan
Cai, Feipeng
Tzimiropoulos, Georgios
Pantic, Maja
[J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6548 - 6552
[24] END-TO-END ANCHORED SPEECH RECOGNITION
Wang, Yiming
Fan, Xing
Chen, I-Fan
Liu, Yuzong
Chen, Tongfei
Hoffmeister, Bjorn
[J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7090 - 7094
[25] OPTIMIZING ALIGNMENT OF SPEECH AND LANGUAGE LATENT SPACES FOR END-TO-END SPEECH RECOGNITION AND UNDERSTANDING
Wang, Wei
Ren, Shuo
Qian, Yao
Liu, Shujie
Shi, Yu
Qian, Yanmin
Zeng, Michael
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7802 - 7806
[26] Large Margin Training for Attention Based End-to-End Speech Recognition
Wang, Peidong
Cui, Jia
Weng, Chao
Yu, Dong
[J]. INTERSPEECH 2019, 2019, : 246 - 250
[27] On the Comparison of Popular End-to-End Models for Large Scale Speech Recognition
Li, Jinyu
Wu, Yu
Gaur, Yashesh
Wang, Chengyi
Zhao, Rui
Liu, Shujie
[J]. INTERSPEECH 2020, 2020, : 1 - 5
[28] IMPROVING UNSUPERVISED STYLE TRANSFER IN END-TO-END SPEECH SYNTHESIS WITH END-TO-END SPEECH RECOGNITION
Liu, Da-Rong
Yang, Chi-Yu
Wu, Szu-Lin
Lee, Hung-Yi
[J]. 2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 640 - 647
[29] END-TO-END SPEECH RECOGNITION WITH WORD-BASED RNN LANGUAGE MODELS
Hori, Takaaki
Cho, Jaejin
Watanabe, Shinji
[J]. 2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 389 - 396
[30] ADVERSARIAL TRAINING OF END-TO-END SPEECH RECOGNITION USING A CRITICIZING LANGUAGE MODEL
Liu, Alexander H.
Lee, Hung-yi
Lee, Lin-shan
[J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6176 - 6180

← 1 2 3 4 5 →