Multi-Task Chinese Speech Recognition Method Based on the Squeezeformer Model

被引:0
|
作者
Guo, Ying [1 ]
Wang, Li [2 ]
机构
[1] School of Computer Science and software Engineering, University of Science and Technology Liaoning, Anshan,114051, China
[2] College of Computer Science and Technology Liaoning, Anshan,114051, China
关键词
Frequency modulation - Signal encoding - Speech enhancement;
D O I
暂无
中图分类号
学科分类号
摘要
End-to-end training has emerged as a prominent trend in speech recognition, with Conformer models effectively integrating Transformer and CNN architectures. However, their complexity and high computational cost pose deployment challenges. To address these issues, we propose a multi-task Chinese speech recognition method based on the Squeezeformer model. We replace the FMCF structure in Conformer with an MF/CF structure, leveraging the convolutional module as a local Multi- Head Attention (MHA) module to enhance efficiency. Multilevel down-sampling and up-sampling using a time-series U-Net further reduce computational costs. By eliminating redundant LayerNorm layers and employing depthwise separable convolutions, we streamline the model, reduce parameters, and lower deployment costs. An Adaptor Layer is integrated into the MHSA module to mitigate the vanishing gradient problem, and a ScaleVar Layer is added to enhance flexibility. Additionally, the RealFormer module is introduced on the decoding side to improve context understanding. Combining Connectionist Temporal Classification (CTC) with attention-based encoding and decoding models for multi-task learning improves performance and accuracy. Experimental results show that the proposed method reduces the parameters on AISHELL-1 dataset by 16% and reduces the character error rate to 5.50%. At the same time, it also shows good performance on AISHELL-2 dataset. © (2025), (International Association of Engineers). All rights reserved.
引用
收藏
页码:23 / 31
相关论文
共 50 条
  • [1] Multi-task Recurrent Model for Speech and Speaker Recognition
    Tang, Zhiyuan
    Li, Lantian
    Wang, Dong
    [J]. 2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2016,
  • [2] Chinese Named Entity Recognition Model Based on Multi-Task Learning
    Fang, Qin
    Li, Yane
    Feng, Hailin
    Ruan, Yaoping
    [J]. APPLIED SCIENCES-BASEL, 2023, 13 (08):
  • [3] Speech Emotion Recognition based on Multi-Task Learning
    Zhao, Huijuan
    Han Zhijie
    Wang, Ruchuan
    [J]. 2019 IEEE 5TH INTL CONFERENCE ON BIG DATA SECURITY ON CLOUD (BIGDATASECURITY) / IEEE INTL CONFERENCE ON HIGH PERFORMANCE AND SMART COMPUTING (HPSC) / IEEE INTL CONFERENCE ON INTELLIGENT DATA AND SECURITY (IDS), 2019, : 186 - 188
  • [4] Multi-task Recurrent Model for True Multilingual Speech Recognition
    Tang, Zhiyuan
    Li, Lantian
    Wang, Dong
    [J]. 2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2016,
  • [5] Image Recognition of Chinese herbal pieces Based on Multi-task Learning Model
    Hu, Ji-Li
    Wang, Yong-Kang
    Che, Zeng-Yang
    Li, Qian-Qian
    Jiang, Hong-Kun
    Liu, Ling-Jie
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2020, : 1555 - 1559
  • [6] Speech Emotion Recognition with Multi-task Learning
    Cai, Xingyu
    Yuan, Jiahong
    Zheng, Renjie
    Huang, Liang
    Church, Kenneth
    [J]. INTERSPEECH 2021, 2021, : 4508 - 4512
  • [7] Meta Multi-task Learning for Speech Emotion Recognition
    Cai, Ruichu
    Guo, Kaibin
    Xu, Boyan
    Yang, Xiaoyan
    Zhang, Zhenjie
    [J]. INTERSPEECH 2020, 2020, : 3336 - 3340
  • [8] Multi-Task End-to-End Model for Telugu Dialect and Speech Recognition
    Yadavalli, Aditya
    Mirishkar, Ganesh S.
    Vuppala, Anil Kumar
    [J]. INTERSPEECH 2022, 2022, : 1387 - 1391
  • [9] Depression recognition base on acoustic speech model of Multi-task emotional stimulus
    Xing, Yujuan
    Liu, Zhenyu
    Chen, Qiongqiong
    Li, Gang
    Ding, Zhijie
    Feng, Lei
    Hu, Bin
    [J]. BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2023, 85
  • [10] Attention-based LSTM with Multi-task Learning for Distant Speech Recognition
    Zhang, Yu
    Zhang, Pengyuan
    Yan, Yonghong
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3857 - 3861