Investigating Joint CTC-Attention Models for End-to-End Russian Speech Recognition

被引:3
|
作者
Markovnikov, Nikita [1 ,2 ]
Kipyatkova, Irina [1 ,3 ]
机构
[1] Russian Acad Sci SPIIRAS, St Petersburg Inst Informat & Automat, St Petersburg, Russia
[2] ITMO Univ, St Petersburg, Russia
[3] St Petersburg State Univ Aerosp Instrumentat SUAI, St Petersburg, Russia
来源
基金
俄罗斯基础研究基金会;
关键词
End-to-end models; Attention mechanism; Deep learning; Russian speech; Speech recognition;
D O I
10.1007/978-3-030-26061-3_35
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We propose an application of attention-based models for automatic recognition of continuous Russian speech. We experimented with three types of attention mechanism, data augmentation based on a tempo and pitch perturbations, and a beam search pruning method. Moreover we propose a using of sparsemax function for our task as a probability distribution generator for an attention mechanism. We experimented with a joint CTC-Attention encoder-decoders using deep convolutional networks to compress input features or waveform spectrograms. Also we experimented with Highway LSTM model as an encoder. We performed experiments with a small dataset of Russian speech with total duration of more than 60 h. We got the recognition accuracy improvement by using proposed methods and showed better performance in terms of speech decoding speed using the beam search optimization method.
引用
收藏
页码:337 / 347
页数:11
相关论文
共 50 条
  • [41] WARPED ENSEMBLES : A NOVEL TECHNIQUE FOR IMPROVING CTC BASED END-TO-END SPEECH RECOGNITION
    Praveen, Kiran
    Sailor, Hardik
    Pandey, Abhishek
    [J]. 2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 457 - 464
  • [42] Speaker Adaptation for Attention-Based End-to-End Speech Recognition
    Meng, Zhong
    Gaur, Yashesh
    Li, Jinyu
    Gong, Yifan
    [J]. INTERSPEECH 2019, 2019, : 241 - 245
  • [43] Large Margin Training for Attention Based End-to-End Speech Recognition
    Wang, Peidong
    Cui, Jia
    Weng, Chao
    Yu, Dong
    [J]. INTERSPEECH 2019, 2019, : 246 - 250
  • [44] MODALITY ATTENTION FOR END-TO-END AUDIO-VISUAL SPEECH RECOGNITION
    Zhou, Pan
    Yang, Wenwen
    Chen, Wei
    Wang, Yanfeng
    Jia, Jia
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6565 - 6569
  • [45] Noise-robust Attention Learning for End-to-End Speech Recognition
    Higuchi, Yosuke
    Tawara, Naohiro
    Ogawa, Atsunori
    Iwata, Tomoharu
    Kobayashi, Tetsunori
    Ogawa, Tetsuji
    [J]. 28TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2020), 2021, : 311 - 315
  • [46] ATTENTION-BASED END-TO-END SPEECH RECOGNITION ON VOICE SEARCH
    Shan, Changhao
    Zhang, Junbo
    Wang, Yujun
    Xie, Lei
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4764 - 4768
  • [47] Adversarial Regularization for Attention Based End-to-End Robust Speech Recognition
    Sun, Sining
    Guo, Pengcheng
    Xie, Lei
    Hwang, Mei-Yuh
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (11) : 1826 - 1838
  • [48] Incorporating End-to-End Speech Recognition Models for Sentiment Analysis
    Lakomkin, Egor
    Zamani, Mohammad Ali
    Webers, Cornelius
    Magg, Sven
    Wermter, Stefan
    [J]. 2019 INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2019, : 7976 - 7982
  • [49] END-TO-END ATTENTION-BASED LARGE VOCABULARY SPEECH RECOGNITION
    Bandanau, Dzmitry
    Chorowski, Jan
    Serdyuk, Dmitriy
    Brakel, Philemon
    Bengio, Yoshua
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 4945 - 4949
  • [50] END-TO-END SPEECH RECOGNITION FROM FEDERATED ACOUSTIC MODELS
    Gao, Yan
    Parcollet, Titouan
    Zaiem, Salah
    Fernandez-Marques, Javier
    de Gusmao, Pedro P. B.
    Beutel, Daniel J.
    Lane, Nicholas D.
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7227 - 7231