Extending Recurrent Neural Aligner for Streaming End-to-End Speech Recognition in Mandarin

被引:6
|
作者
Dong, Linhao [1 ,2 ]
Zhou, Shiyu [1 ,2 ]
Chen, Wei [1 ]
Xu, Bo [1 ]
机构
[1] Chinese Acad Sci, Inst Automat, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Beijing, Peoples R China
关键词
speech recognition; recurrent neural aligner; mandarin; end-to-end;
D O I
10.21437/Interspeech.2018-1086
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
End-to-end models have been showing superiority in Automatic Speech Recognition (ASR). At the same time, the capacity of streaming recognition has become a growing requirement for end-to-end models. Following these trends, an encoder-decoder recurrent neural network called Recurrent Neural Aligner (RNA) has been freshly proposed and shown its competitiveness on two English ASR tasks. However, it is not clear if RNA can be further improved and applied to other spoken language. In this work, we explore the applicability of RNA in Mandarin Chinese and present four effective extensions: In the encoder, we redesign the temporal down-sampling and introduce a powerful convolutional structure. In the decoder, we utilize a regularizer to smooth the output distribution and conduct joint training with a language model. On two Mandarin Chinese conversational telephone speech recognition (MTS) datasets, our Extended-RNA obtains promising performance. Particularly, it achieves 27.7% character error rate (CER), which is superior to current state-of-the-art result on the popular HKUST task.
引用
收藏
页码:816 / 820
页数:5
相关论文
共 50 条
  • [1] Segmental Recurrent Neural Networks for End-to-end Speech Recognition
    Lu, Liang
    Kong, Lingpeng
    Dyer, Chris
    Smith, Noah A.
    Renals, Steve
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 385 - 389
  • [2] Towards End-to-End Speech Recognition with Recurrent Neural Networks
    Graves, Alex
    Jaitly, Navdeep
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 32 (CYCLE 2), 2014, 32 : 1764 - 1772
  • [3] Review of End-to-End Streaming Speech Recognition
    Wang, Aohui
    Zhang, Long
    Song, Wenyu
    Meng, Jie
    [J]. Computer Engineering and Applications, 2024, 59 (02) : 22 - 33
  • [4] STREAMING END-TO-END SPEECH RECOGNITION WITH JOINTLY TRAINED NEURAL FEATURE ENHANCEMENT
    Kim, Chanwoo
    Garg, Abhinav
    Gowda, Dhananjaya
    Mun, Seongkyu
    Han, Changwoo
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6773 - 6777
  • [5] Deep Speech 2: End-to-End Speech Recognition in English and Mandarin
    Amodei, Dario
    Ananthanarayanan, Sundaram
    Anubhai, Rishita
    Bai, Jingliang
    Battenberg, Eric
    Case, Carl
    Casper, Jared
    Catanzaro, Bryan
    Cheng, Qiang
    Chen, Guoliang
    Chen, Jie
    Chen, Jingdong
    Chen, Zhijie
    Chrzanowski, Mike
    Coates, Adam
    Diamos, Greg
    Ding, Ke
    Du, Niandong
    Elsen, Erich
    Engel, Jesse
    Fang, Weiwei
    Fan, Linxi
    Fougner, Christopher
    Gao, Liang
    Gong, Caixia
    Hannun, Awni
    Han, Tony
    Johannes, Lappi Vaino
    Jiang, Bing
    Ju, Cai
    Jun, Billy
    LeGresley, Patrick
    Lin, Libby
    Liu, Junjie
    Liu, Yang
    Li, Weigao
    Li, Xiangang
    Ma, Dongpeng
    Narang, Sharan
    Ng, Andrew
    Ozair, Sherjil
    Peng, Yiping
    Prenger, Ryan
    Qian, Sheng
    Quan, Zongfeng
    Raiman, Jonathan
    Rao, Vinay
    Satheesh, Sanjeev
    Seetapun, David
    Sengupta, Shubho
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [6] STREAMING END-TO-END SPEECH RECOGNITION FOR MOBILE DEVICES
    He, Yanzhang
    Sainath, Tara N.
    Prabhavalkar, Rohit
    McGraw, Ian
    Alvarez, Raziel
    Zhao, Ding
    Rybach, David
    Kannan, Anjuli
    Wu, Yonghui
    Pang, Ruoming
    Liang, Qiao
    Bhatia, Deepti
    Yuan Shangguan
    Li, Bo
    Pundak, Golan
    Sim, Khe Chai
    Bagby, Tom
    Chang, Shuo-yiin
    Rao, Kanishka
    Gruenstein, Alexander
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6381 - 6385
  • [7] Lightweight End-to-End Architecture for Streaming Speech Recognition
    Yang S.
    Li X.
    [J]. Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2023, 36 (03): : 268 - 279
  • [8] Semantic Data Augmentation for End-to-End Mandarin Speech Recognition
    Sun, Jianwei
    Tang, Zhiyuan
    Yin, Hengxin
    Wang, Wei
    Zhao, Xi
    Zhao, Shuaijiang
    Lei, Xiaoning
    Zou, Wei
    Li, Xiangang
    [J]. INTERSPEECH 2021, 2021, : 1269 - 1273
  • [9] End-to-End Mandarin Speech Recognition Combining CNN and BLSTM
    Wang, Dong
    Wang, Xiaodong
    Lv, Shaohe
    [J]. SYMMETRY-BASEL, 2019, 11 (05):
  • [10] Streaming End-to-End Multi-Talker Speech Recognition
    Lu, Liang
    Kanda, Naoyuki
    Li, Jinyu
    Gong, Yifan
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 803 - 807