Multi-Task Chinese Speech Recognition Method Based on the Squeezeformer Model

被引:0
|
作者
Guo, Ying [1 ]
Wang, Li [2 ]
机构
[1] School of Computer Science and software Engineering, University of Science and Technology Liaoning, Anshan,114051, China
[2] College of Computer Science and Technology Liaoning, Anshan,114051, China
关键词
Frequency modulation - Signal encoding - Speech enhancement;
D O I
暂无
中图分类号
学科分类号
摘要
End-to-end training has emerged as a prominent trend in speech recognition, with Conformer models effectively integrating Transformer and CNN architectures. However, their complexity and high computational cost pose deployment challenges. To address these issues, we propose a multi-task Chinese speech recognition method based on the Squeezeformer model. We replace the FMCF structure in Conformer with an MF/CF structure, leveraging the convolutional module as a local Multi- Head Attention (MHA) module to enhance efficiency. Multilevel down-sampling and up-sampling using a time-series U-Net further reduce computational costs. By eliminating redundant LayerNorm layers and employing depthwise separable convolutions, we streamline the model, reduce parameters, and lower deployment costs. An Adaptor Layer is integrated into the MHSA module to mitigate the vanishing gradient problem, and a ScaleVar Layer is added to enhance flexibility. Additionally, the RealFormer module is introduced on the decoding side to improve context understanding. Combining Connectionist Temporal Classification (CTC) with attention-based encoding and decoding models for multi-task learning improves performance and accuracy. Experimental results show that the proposed method reduces the parameters on AISHELL-1 dataset by 16% and reduces the character error rate to 5.50%. At the same time, it also shows good performance on AISHELL-2 dataset. © (2025), (International Association of Engineers). All rights reserved.
引用
收藏
页码:23 / 31
相关论文
共 50 条
  • [21] Multi-Task Conformer with Multi-Feature Combination for Speech Emotion Recognition
    Seo, Jiyoung
    Lee, Bowon
    [J]. SYMMETRY-BASEL, 2022, 14 (07):
  • [22] A Double Adversarial Network Model for Multi-Domain and Multi-Task Chinese Named Entity Recognition
    Hu, Yun
    Zheng, Changwen
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2020, E103D (07) : 1744 - 1752
  • [23] MULTI-TASK LANGUAGE MODELING FOR IMPROVING SPEECH RECOGNITION OF RARE WORDS
    Yang, Chao-Han Huck
    Liu, Linda
    Gandhe, Ankur
    Gu, Yile
    Raju, Anirudh
    Filimonov, Denis
    Bulyko, Ivan
    [J]. 2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 1087 - 1093
  • [24] EmoComicNet: A multi-task model for comic emotion recognition
    Dutta, Arpita
    Biswas, Samit
    Das, Amit Kumar
    [J]. PATTERN RECOGNITION, 2024, 150
  • [25] A Primary task driven adaptive loss function for multi-task speech emotion recognition
    Liu, Lu-Yao
    Liu, Wen-Zhe
    Feng, Lin
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 127
  • [26] Speech Emotion Recognition in the Wild using Multi-task and Adversarial Learning
    Parry, Jack
    DeMattos, Eric
    Klementiev, Anita
    Ind, Axel
    Morse-Kopp, Daniela
    Clarke, Georgia
    Palaz, Dimitri
    [J]. INTERSPEECH 2022, 2022, : 1158 - 1162
  • [27] A double adversarial network model for multi-domain and multi-task Chinese named entity recognition
    Hu Y.
    Zheng C.
    [J]. IEICE Transactions on Information and Systems, 2020, E103.D (07) : 1744 - 1752
  • [28] Speaker-Aware Multi-Task Learning for Automatic Speech Recognition
    Pironkov, Gueorgui
    Dupont, Stephane
    Dutoit, Thierry
    [J]. 2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 2900 - 2905
  • [29] MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION
    Ravanelli, Mirco
    Zhong, Jianyuan
    Pascual, Santiago
    Swietojanski, Pawel
    Monteiro, Joao
    Trmal, Jan
    Bengio, Yoshua
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6989 - 6993
  • [30] Chinese Clinical Named Entity Recognition Based on Stroke ELMo and Multi-Task Learning
    Luo L.
    Yang Z.-H.
    Song Y.-W.
    Li N.
    Lin H.-F.
    [J]. Yang, Zhi-Hao (yangzh@dlut.edu.cn), 1943, Science Press (43): : 1943 - 1957