End-to-End Large Vocabulary Speech Recognition for the Serbian Language

被引:6
|
作者
Popovic, Branislav [1 ,2 ]
Pakoci, Edvin [1 ,2 ]
Pekar, Darko [1 ,2 ]
机构
[1] Univ Novi Sad, Dept Power Elect & Telecommun Engn, Fac Tech Sci, Trg Dositeja Obradovica 6, Novi Sad 21000, Serbia
[2] AlfaNum Speech Technol, Bulevar Vojvode Stepe 40, Novi Sad 21000, Serbia
来源
关键词
Eesen; End-to-end; LSTM; Speech recognition; Serbian;
D O I
10.1007/978-3-319-66429-3_33
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents the results of a large vocabulary speech recognition for the Serbian language, developed by using Eesen end-to-end framework. Eesen involves training a single deep recurrent neural network, containing a number of bidirectional long short-term memory layers, modeling the connection between the speech and a set of context-independent lexicon units. This approach reduces the amount of expert knowledge needed in order to develop other competitive speech recognition systems. The training is based on a connectionist temporal classification, while decoding allows the usage of weighted finite-state transducers. This provides much faster and more efficient decoding in comparison to other similar systems. A corpus of approximately 215 h of audio data (about 171 h of speech and 44 h of silence, or 243 male and 239 female speakers) was employed for the training (about 90%) and testing (about 10%) purposes. On a set of more than 120000 words, the word error rate of 14.68% and the character error rate of 3.68% is achieved.
引用
收藏
页码:343 / 352
页数:10
相关论文
共 50 条
  • [1] END-TO-END TRAINING OF A LARGE VOCABULARY END-TO-END SPEECH RECOGNITION SYSTEM
    Kim, Chanwoo
    Kim, Sungsoo
    Kim, Kwangyoun
    Kumar, Mehul
    Kim, Jiyeon
    Lee, Kyungmin
    Han, Changwoo
    Garg, Abhinav
    Kim, Eunhyang
    Shin, Minkyoo
    Singh, Shatrughan
    Heck, Larry
    Gowda, Dhananjaya
    [J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 562 - 569
  • [2] END-TO-END ATTENTION-BASED LARGE VOCABULARY SPEECH RECOGNITION
    Bandanau, Dzmitry
    Chorowski, Jan
    Serdyuk, Dmitriy
    Brakel, Philemon
    Bengio, Yoshua
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 4945 - 4949
  • [3] End-to-End Speech Recognition of Tamil Language
    Changrampadi, Mohamed Hashim
    Shahina, A.
    Narayanan, M. Badri
    Khan, A. Nayeemulla
    [J]. INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2022, 32 (02): : 1309 - 1323
  • [4] End-to-End Training of Acoustic Models for Large Vocabulary Continuous Speech Recognition with TensorFlow
    Variani, Ehsan
    Bagby, Tom
    McDermott, Erik
    Bacchiani, Michiel
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1641 - 1645
  • [5] MULTI-LEVEL LANGUAGE MODELING AND DECODING FOR OPEN VOCABULARY END-TO-END SPEECH RECOGNITION
    Hori, Takaaki
    Watanabe, Shinji
    Hershey, John R.
    [J]. 2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 287 - 293
  • [6] Residual Language Model for End-to-end Speech Recognition
    Tsunoo, Emiru
    Kashiwagi, Yosuke
    Narisetty, Chaitanya
    Watanabe, Shinji
    [J]. INTERSPEECH 2022, 2022, : 3899 - 3903
  • [7] TOWARDS LANGUAGE-UNIVERSAL END-TO-END SPEECH RECOGNITION
    Kim, Suyoun
    Seltzer, Michael L.
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4914 - 4918
  • [8] ON TRAINING THE RECURRENT NEURAL NETWORK ENCODER-DECODER FOR LARGE VOCABULARY END-TO-END SPEECH RECOGNITION
    Lu, Liang
    Zhang, Xingxing
    Renals, Steve
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5060 - 5064
  • [9] LEVERAGING LANGUAGE ID IN MULTILINGUAL END-TO-END SPEECH RECOGNITION
    Waters, Austin
    Gaur, Neeraj
    Haghani, Parisa
    Moreno, Pedro
    Qu, Zhongdi
    [J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 928 - 935
  • [10] Noise Robust End-to-End Speech Recognition For Bangla Language
    Sumit, Sakhawat Hosain
    Al Muntasir, Tareq
    Zaman, M. M. Arefin
    Nandi, Rabindra Nath
    Sourov, Tanvir
    [J]. 2018 INTERNATIONAL CONFERENCE ON BANGLA SPEECH AND LANGUAGE PROCESSING (ICBSLP), 2018,