End-to-End Speech Recognition in Russian

被引:1
|
作者
Markovnikov, Nikita [1 ]
Kipyatkova, Irina [1 ]
Lyakso, Elena [1 ]
机构
[1] Russian Acad Sci SPIIRAS, St Petersburg Inst Informat & Automat, St Petersburg, Russia
来源
基金
俄罗斯科学基金会;
关键词
End-to-end models; Deep learning; Russian speech; Speech recognition;
D O I
10.1007/978-3-319-99579-3_40
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
End-to-end speech recognition systems incorporating deep neural networks (DNNs) have achieved good results. We propose applying CTC (Connectionist Temporal Classification) models and attention-based encoder-decoder in automatic recognition of the Russian continuous speech. We used different neural network models such Long short-term memory (LSTM), bidirectional LSTM and Residual Networks to provide experiments. We got recognition accuracy a bit worse than hybrid models but our models can work without large language model and they showed better performance in terms of average decoding speed that can be helpful in real systems. Experiments are performed with extra-large vocabulary (more than 150K words) of Russian speech.
引用
收藏
页码:377 / 386
页数:10
相关论文
共 50 条
  • [1] Investigation of Transfer Learning for End-to-End Russian Speech Recognition
    Kipyatkova, Irina
    [J]. SPEECH AND COMPUTER, SPECOM 2022, 2022, 13721 : 349 - 357
  • [2] END-TO-END MULTIMODAL SPEECH RECOGNITION
    Palaskar, Shruti
    Sanabria, Ramon
    Metze, Florian
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5774 - 5778
  • [3] Overview of end-to-end speech recognition
    Wang, Song
    Li, Guanyu
    [J]. 2018 INTERNATIONAL SYMPOSIUM ON POWER ELECTRONICS AND CONTROL ENGINEERING (ISPECE 2018), 2019, 1187
  • [4] END-TO-END AUDIOVISUAL SPEECH RECOGNITION
    Petridis, Stavros
    Stafylakis, Themos
    Ma, Pingchuan
    Cai, Feipeng
    Tzimiropoulos, Georgios
    Pantic, Maja
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6548 - 6552
  • [5] Multichannel End-to-end Speech Recognition
    Ochiai, Tsubasa
    Watanabe, Shinji
    Hori, Takaaki
    Hershey, John R.
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
  • [6] End-to-end Accented Speech Recognition
    Viglino, Thibault
    Motlicek, Petr
    Cernak, Milos
    [J]. INTERSPEECH 2019, 2019, : 2140 - 2144
  • [7] END-TO-END ANCHORED SPEECH RECOGNITION
    Wang, Yiming
    Fan, Xing
    Chen, I-Fan
    Liu, Yuzong
    Chen, Tongfei
    Hoffmeister, Bjorn
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7090 - 7094
  • [8] IMPROVING UNSUPERVISED STYLE TRANSFER IN END-TO-END SPEECH SYNTHESIS WITH END-TO-END SPEECH RECOGNITION
    Liu, Da-Rong
    Yang, Chi-Yu
    Wu, Szu-Lin
    Lee, Hung-Yi
    [J]. 2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 640 - 647
  • [9] END-TO-END TRAINING OF A LARGE VOCABULARY END-TO-END SPEECH RECOGNITION SYSTEM
    Kim, Chanwoo
    Kim, Sungsoo
    Kim, Kwangyoun
    Kumar, Mehul
    Kim, Jiyeon
    Lee, Kyungmin
    Han, Changwoo
    Garg, Abhinav
    Kim, Eunhyang
    Shin, Minkyoo
    Singh, Shatrughan
    Heck, Larry
    Gowda, Dhananjaya
    [J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 562 - 569
  • [10] An End-to-End model for Vietnamese speech recognition
    Van Huy Nguyen
    [J]. 2019 IEEE - RIVF INTERNATIONAL CONFERENCE ON COMPUTING AND COMMUNICATION TECHNOLOGIES (RIVF), 2019, : 307 - 312