End-to-End Speech Recognition Technology Based on Multi-Stream CNN

被引:0
|
作者
Xiao, Hao [1 ]
Qiu, Yuan [1 ]
Fei, Rong [1 ]
Chen, Xiongbo [2 ]
Liu, Zuo [2 ]
Wu, Zongling [1 ]
机构
[1] Xian Univ Technol, Coll Comp Sci & Engn, Xian, Peoples R China
[2] Xian Univ Technol, Guangxi CAIH Smart Telecom Tech Co Ltd, Xian, Guangxi, Peoples R China
关键词
Speech Recognition; MCNN; Transformer; CTC;
D O I
10.1109/TrustCom56396.2022.00183
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
At a time when end-to-end speech recognition technology is becoming more and more popular, we conduct research on various end-to-end speech technologies, and use the Transformer-based speech framework to study and find that its multi-head attention is not effective in local feature acquisition. And in the face of noise problems in real scenes, the training convergence speed is too slow. In order to solve the problems caused by Transformer, a new speech recognition framework based on MCNN-Transformer-CTC speech recognition method is proposed. Through MCNN (multi-stream convolutional neural network) in the pre-acoustic unit through multiple parallel channels Local feature extraction is carried out in terms of time width and spectral capability, which makes up for the lack of selfattention mechanism in local feature extraction, and the multitask learning method is used to add CTC structure to make up for the problem of slow training convergence. The training effect of this model on the Aishell1 dataset has reached a CER of 6.23%, which is a further improvement compared to the Transformer model.
引用
收藏
页码:1310 / 1315
页数:6
相关论文
共 50 条
  • [41] PARAMETER UNCERTAINTY FOR END-TO-END SPEECH RECOGNITION
    Braun, Stefan
    Liu, Shih-Chii
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5636 - 5640
  • [42] SYNCHRONOUS TRANSFORMERS FOR END-TO-END SPEECH RECOGNITION
    Tian, Zhengkun
    Yi, Jiangyan
    Bai, Ye
    Tao, Jianhua
    Zhang, Shuai
    Wen, Zhengqi
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7884 - 7888
  • [43] TRIGGERED ATTENTION FOR END-TO-END SPEECH RECOGNITION
    Moritz, Niko
    Hori, Takaaki
    Le Roux, Jonathan
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5666 - 5670
  • [44] An Overview of End-to-End Automatic Speech Recognition
    Wang, Dong
    Wang, Xiaodong
    Lv, Shaohe
    [J]. SYMMETRY-BASEL, 2019, 11 (08):
  • [45] TOWARDS END-TO-END UNSUPERVISED SPEECH RECOGNITION
    Liu, Alexander H.
    Hsu, Wei-Ning
    Auli, Michael
    Baevski, Alexei
    [J]. 2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 221 - 228
  • [46] End-to-End Speech Recognition For Arabic Dialects
    Nasr, Seham
    Duwairi, Rehab
    Quwaider, Muhannad
    [J]. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2023, 48 (08) : 10617 - 10633
  • [47] End-to-End Speech Recognition and Disfluency Removal
    Lou, Paria Jamshid
    Johnson, Mark
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 2051 - 2061
  • [48] Performance Monitoring for End-to-End Speech Recognition
    Li, Ruizhi
    Sell, Gregory
    Hermansky, Hynek
    [J]. INTERSPEECH 2019, 2019, : 2245 - 2249
  • [49] End-to-end Korean Digits Speech Recognition
    Roh, Jong-hyuk
    Cho, Kwantae
    Kim, Youngsam
    Cho, Sangrae
    [J]. 2019 10TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY CONVERGENCE (ICTC): ICT CONVERGENCE LEADING THE AUTONOMOUS FUTURE, 2019, : 1137 - 1139
  • [50] End-to-End Speech Recognition in Agglutinative Languages
    Mamyrbayev, Orken
    Alimhan, Keylan
    Zhumazhanov, Bagashar
    Turdalykyzy, Tolganay
    Gusmanova, Farida
    [J]. INTELLIGENT INFORMATION AND DATABASE SYSTEMS (ACIIDS 2020), PT II, 2020, 12034 : 391 - 401