Efficiency of End-to-End Speech Recognition for Languages with Scarce Resources

被引:0
|
作者
Rudzionis, Vytautas [1 ]
Malukas, Ugnius [2 ]
Lopata, Audrius [2 ]
机构
[1] Vilnius Univ, Kaunas Fac, Muitines 8, Kaunas, Lithuania
[2] Kaunas Univ Technol, Studentu 50, Kaunas, Lithuania
关键词
Speech recognition; Hybrid methods; Machine learning; End-to-end speech recognition;
D O I
10.1007/978-3-031-16302-9_20
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Modern deep learning based speech recognition methods allow for achieving phenomenal speech recognition accuracy. But it requires enormous amounts of data to train such systems to achieve high recognition accuracy. Many less widely spoken languages simply do not possess the necessary amounts of speech corpora. The paper presents attempts to evaluate DeepSpeech-based speech recognition efficiency with the limited amounts of training data available and the ways to improve the accuracy. The experiments showed that the accuracy of DeepSpeech2 recognizer with about 100 h of speech corpora used for training is quite modest but the application of simple grammatical constraints allowed to reduce the word error rate to 23-25%.
引用
收藏
页码:259 / 264
页数:6
相关论文
共 50 条
  • [41] End-to-End Neural Segmental Models for Speech Recognition
    Tang, Hao
    Lu, Liang
    Kong, Lingpeng
    Gimpel, Kevin
    Livescu, Karen
    Dyer, Chris
    Smith, Noah A.
    Renals, Steve
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2017, 11 (08) : 1254 - 1264
  • [42] IMPROVING END-TO-END SPEECH RECOGNITION WITH POLICY LEARNING
    Zhou, Yingbo
    Xiong, Caiming
    Socher, Richard
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5819 - 5823
  • [43] STREAMING END-TO-END SPEECH RECOGNITION FOR MOBILE DEVICES
    He, Yanzhang
    Sainath, Tara N.
    Prabhavalkar, Rohit
    McGraw, Ian
    Alvarez, Raziel
    Zhao, Ding
    Rybach, David
    Kannan, Anjuli
    Wu, Yonghui
    Pang, Ruoming
    Liang, Qiao
    Bhatia, Deepti
    Yuan Shangguan
    Li, Bo
    Pundak, Golan
    Sim, Khe Chai
    Bagby, Tom
    Chang, Shuo-yiin
    Rao, Kanishka
    Gruenstein, Alexander
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6381 - 6385
  • [44] SELF-TRAINING FOR END-TO-END SPEECH RECOGNITION
    Kahn, Jacob
    Lee, Ann
    Hannun, Awni
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7084 - 7088
  • [45] Towards end-to-end speech recognition with transfer learning
    Chu-Xiong Qin
    Dan Qu
    Lian-Hai Zhang
    EURASIP Journal on Audio, Speech, and Music Processing, 2018
  • [46] End-to-end named entity recognition for Vietnamese speech
    Nguyen, Thu-Hien
    Nguyen, Thai-Binh
    Do, Quoc-Truong
    Nguyen, Tuan-Linh
    2022 25TH CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (O-COCOSDA 2022), 2022,
  • [47] Two-Pass End-to-End Speech Recognition
    Sainath, Tara N.
    Pang, Ruoming
    Rybach, David
    He, Yanzhang
    Prabhavalkar, Rohit
    Li, Wei
    Visontai, Mirko
    Liang, Qiao
    Strohman, Trevor
    Wu, Yonghui
    McGraw, Ian
    Chiu, Chung-Cheng
    INTERSPEECH 2019, 2019, : 2773 - 2777
  • [48] Online Compressive Transformer for End-to-End Speech Recognition
    Leong, Chi-Hang
    Huang, Yu-Han
    Chien, Jen-Tzung
    INTERSPEECH 2021, 2021, : 2082 - 2086
  • [49] Towards end-to-end speech recognition with transfer learning
    Qin, Chu-Xiong
    Qu, Dan
    Zhang, Lian-Hai
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2018,
  • [50] End-to-End Speech Command Recognition with Capsule Network
    Bae, Jaesung
    Kim, Dae-Shik
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 776 - 780