End-to-End Large Vocabulary Speech Recognition for the Serbian Language

被引:6
|
作者
Popovic, Branislav [1 ,2 ]
Pakoci, Edvin [1 ,2 ]
Pekar, Darko [1 ,2 ]
机构
[1] Univ Novi Sad, Dept Power Elect & Telecommun Engn, Fac Tech Sci, Trg Dositeja Obradovica 6, Novi Sad 21000, Serbia
[2] AlfaNum Speech Technol, Bulevar Vojvode Stepe 40, Novi Sad 21000, Serbia
来源
关键词
Eesen; End-to-end; LSTM; Speech recognition; Serbian;
D O I
10.1007/978-3-319-66429-3_33
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents the results of a large vocabulary speech recognition for the Serbian language, developed by using Eesen end-to-end framework. Eesen involves training a single deep recurrent neural network, containing a number of bidirectional long short-term memory layers, modeling the connection between the speech and a set of context-independent lexicon units. This approach reduces the amount of expert knowledge needed in order to develop other competitive speech recognition systems. The training is based on a connectionist temporal classification, while decoding allows the usage of weighted finite-state transducers. This provides much faster and more efficient decoding in comparison to other similar systems. A corpus of approximately 215 h of audio data (about 171 h of speech and 44 h of silence, or 243 male and 239 female speakers) was employed for the training (about 90%) and testing (about 10%) purposes. On a set of more than 120000 words, the word error rate of 14.68% and the character error rate of 3.68% is achieved.
引用
收藏
页码:343 / 352
页数:10
相关论文
共 50 条
  • [21] Multichannel End-to-end Speech Recognition
    Ochiai, Tsubasa
    Watanabe, Shinji
    Hori, Takaaki
    Hershey, John R.
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
  • [22] End-to-end Accented Speech Recognition
    Viglino, Thibault
    Motlicek, Petr
    Cernak, Milos
    [J]. INTERSPEECH 2019, 2019, : 2140 - 2144
  • [23] END-TO-END AUDIOVISUAL SPEECH RECOGNITION
    Petridis, Stavros
    Stafylakis, Themos
    Ma, Pingchuan
    Cai, Feipeng
    Tzimiropoulos, Georgios
    Pantic, Maja
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6548 - 6552
  • [24] END-TO-END ANCHORED SPEECH RECOGNITION
    Wang, Yiming
    Fan, Xing
    Chen, I-Fan
    Liu, Yuzong
    Chen, Tongfei
    Hoffmeister, Bjorn
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7090 - 7094
  • [25] OPTIMIZING ALIGNMENT OF SPEECH AND LANGUAGE LATENT SPACES FOR END-TO-END SPEECH RECOGNITION AND UNDERSTANDING
    Wang, Wei
    Ren, Shuo
    Qian, Yao
    Liu, Shujie
    Shi, Yu
    Qian, Yanmin
    Zeng, Michael
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7802 - 7806
  • [26] Large Margin Training for Attention Based End-to-End Speech Recognition
    Wang, Peidong
    Cui, Jia
    Weng, Chao
    Yu, Dong
    [J]. INTERSPEECH 2019, 2019, : 246 - 250
  • [27] On the Comparison of Popular End-to-End Models for Large Scale Speech Recognition
    Li, Jinyu
    Wu, Yu
    Gaur, Yashesh
    Wang, Chengyi
    Zhao, Rui
    Liu, Shujie
    [J]. INTERSPEECH 2020, 2020, : 1 - 5
  • [28] IMPROVING UNSUPERVISED STYLE TRANSFER IN END-TO-END SPEECH SYNTHESIS WITH END-TO-END SPEECH RECOGNITION
    Liu, Da-Rong
    Yang, Chi-Yu
    Wu, Szu-Lin
    Lee, Hung-Yi
    [J]. 2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 640 - 647
  • [29] END-TO-END SPEECH RECOGNITION WITH WORD-BASED RNN LANGUAGE MODELS
    Hori, Takaaki
    Cho, Jaejin
    Watanabe, Shinji
    [J]. 2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 389 - 396
  • [30] ADVERSARIAL TRAINING OF END-TO-END SPEECH RECOGNITION USING A CRITICIZING LANGUAGE MODEL
    Liu, Alexander H.
    Lee, Hung-yi
    Lee, Lin-shan
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6176 - 6180