End-to-End Large Vocabulary Speech Recognition for the Serbian Language

被引：6

作者：

Popovic, Branislav ^{[1
,2
]}

Pakoci, Edvin ^{[1
,2
]}

Pekar, Darko ^{[1
,2
]}

机构：

[1] Univ Novi Sad, Dept Power Elect & Telecommun Engn, Fac Tech Sci, Trg Dositeja Obradovica 6, Novi Sad 21000, Serbia

[2] AlfaNum Speech Technol, Bulevar Vojvode Stepe 40, Novi Sad 21000, Serbia

来源：

SPEECH AND COMPUTER, SPECOM 2017 | 2017年 / 10458卷

关键词：

Eesen; End-to-end; LSTM; Speech recognition; Serbian;

D O I：

10.1007/978-3-319-66429-3_33

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper presents the results of a large vocabulary speech recognition for the Serbian language, developed by using Eesen end-to-end framework. Eesen involves training a single deep recurrent neural network, containing a number of bidirectional long short-term memory layers, modeling the connection between the speech and a set of context-independent lexicon units. This approach reduces the amount of expert knowledge needed in order to develop other competitive speech recognition systems. The training is based on a connectionist temporal classification, while decoding allows the usage of weighted finite-state transducers. This provides much faster and more efficient decoding in comparison to other similar systems. A corpus of approximately 215 h of audio data (about 171 h of speech and 44 h of silence, or 243 male and 239 female speakers) was employed for the training (about 90%) and testing (about 10%) purposes. On a set of more than 120000 words, the word error rate of 14.68% and the character error rate of 3.68% is achieved.

引用

页码：343 / 352

页数：10

共 50 条

[1] END-TO-END TRAINING OF A LARGE VOCABULARY END-TO-END SPEECH RECOGNITION SYSTEM
Kim, Chanwoo
Kim, Sungsoo
Kim, Kwangyoun
Kumar, Mehul
Kim, Jiyeon
Lee, Kyungmin
Han, Changwoo
Garg, Abhinav
Kim, Eunhyang
Shin, Minkyoo
Singh, Shatrughan
Heck, Larry
Gowda, Dhananjaya
[J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 562 - 569
[2] END-TO-END ATTENTION-BASED LARGE VOCABULARY SPEECH RECOGNITION
Bandanau, Dzmitry
Chorowski, Jan
Serdyuk, Dmitriy
Brakel, Philemon
Bengio, Yoshua
[J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 4945 - 4949
[3] End-to-End Speech Recognition of Tamil Language
Changrampadi, Mohamed Hashim
Shahina, A.
Narayanan, M. Badri
Khan, A. Nayeemulla
[J]. INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2022, 32 (02): : 1309 - 1323
[4] End-to-End Training of Acoustic Models for Large Vocabulary Continuous Speech Recognition with TensorFlow
Variani, Ehsan
Bagby, Tom
McDermott, Erik
Bacchiani, Michiel
[J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1641 - 1645
[5] MULTI-LEVEL LANGUAGE MODELING AND DECODING FOR OPEN VOCABULARY END-TO-END SPEECH RECOGNITION
Hori, Takaaki
Watanabe, Shinji
Hershey, John R.
[J]. 2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 287 - 293
[6] Residual Language Model for End-to-end Speech Recognition
Tsunoo, Emiru
Kashiwagi, Yosuke
Narisetty, Chaitanya
Watanabe, Shinji
[J]. INTERSPEECH 2022, 2022, : 3899 - 3903
[7] TOWARDS LANGUAGE-UNIVERSAL END-TO-END SPEECH RECOGNITION
Kim, Suyoun
Seltzer, Michael L.
[J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4914 - 4918
[8] ON TRAINING THE RECURRENT NEURAL NETWORK ENCODER-DECODER FOR LARGE VOCABULARY END-TO-END SPEECH RECOGNITION
Lu, Liang
Zhang, Xingxing
Renals, Steve
[J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5060 - 5064
[9] LEVERAGING LANGUAGE ID IN MULTILINGUAL END-TO-END SPEECH RECOGNITION
Waters, Austin
Gaur, Neeraj
Haghani, Parisa
Moreno, Pedro
Qu, Zhongdi
[J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 928 - 935
[10] Noise Robust End-to-End Speech Recognition For Bangla Language
Sumit, Sakhawat Hosain
Al Muntasir, Tareq
Zaman, M. M. Arefin
Nandi, Rabindra Nath
Sourov, Tanvir
[J]. 2018 INTERNATIONAL CONFERENCE ON BANGLA SPEECH AND LANGUAGE PROCESSING (ICBSLP), 2018,

← 1 2 3 4 5 →