Multi-Task Chinese Speech Recognition Method Based on the Squeezeformer Model

被引：0

作者：

Guo, Ying ^{[1
]}

Wang, Li ^{[2
]}

机构：

[1] School of Computer Science and software Engineering, University of Science and Technology Liaoning, Anshan,114051, China

[2] College of Computer Science and Technology Liaoning, Anshan,114051, China

来源：

IAENG International Journal of Computer Science | 2025年 / 52卷 / 01期

关键词：

Frequency modulation - Signal encoding - Speech enhancement;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

End-to-end training has emerged as a prominent trend in speech recognition, with Conformer models effectively integrating Transformer and CNN architectures. However, their complexity and high computational cost pose deployment challenges. To address these issues, we propose a multi-task Chinese speech recognition method based on the Squeezeformer model. We replace the FMCF structure in Conformer with an MF/CF structure, leveraging the convolutional module as a local Multi- Head Attention (MHA) module to enhance efficiency. Multilevel down-sampling and up-sampling using a time-series U-Net further reduce computational costs. By eliminating redundant LayerNorm layers and employing depthwise separable convolutions, we streamline the model, reduce parameters, and lower deployment costs. An Adaptor Layer is integrated into the MHSA module to mitigate the vanishing gradient problem, and a ScaleVar Layer is added to enhance flexibility. Additionally, the RealFormer module is introduced on the decoding side to improve context understanding. Combining Connectionist Temporal Classification (CTC) with attention-based encoding and decoding models for multi-task learning improves performance and accuracy. Experimental results show that the proposed method reduces the parameters on AISHELL-1 dataset by 16% and reduces the character error rate to 5.50%. At the same time, it also shows good performance on AISHELL-2 dataset. © (2025), (International Association of Engineers). All rights reserved.

引用

页码：23 / 31

共 50 条

[21] Multi-Task Conformer with Multi-Feature Combination for Speech Emotion Recognition
Seo, Jiyoung
Lee, Bowon
[J]. SYMMETRY-BASEL, 2022, 14 (07):
[22] A Double Adversarial Network Model for Multi-Domain and Multi-Task Chinese Named Entity Recognition
Hu, Yun
Zheng, Changwen
[J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2020, E103D (07) : 1744 - 1752
[23] MULTI-TASK LANGUAGE MODELING FOR IMPROVING SPEECH RECOGNITION OF RARE WORDS
Yang, Chao-Han Huck
Liu, Linda
Gandhe, Ankur
Gu, Yile
Raju, Anirudh
Filimonov, Denis
Bulyko, Ivan
[J]. 2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 1087 - 1093
[24] EmoComicNet: A multi-task model for comic emotion recognition
Dutta, Arpita
Biswas, Samit
Das, Amit Kumar
[J]. PATTERN RECOGNITION, 2024, 150
[25] A Primary task driven adaptive loss function for multi-task speech emotion recognition
Liu, Lu-Yao
Liu, Wen-Zhe
Feng, Lin
[J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 127
[26] Speech Emotion Recognition in the Wild using Multi-task and Adversarial Learning
Parry, Jack
DeMattos, Eric
Klementiev, Anita
Ind, Axel
Morse-Kopp, Daniela
Clarke, Georgia
Palaz, Dimitri
[J]. INTERSPEECH 2022, 2022, : 1158 - 1162
[27] A double adversarial network model for multi-domain and multi-task Chinese named entity recognition
Hu Y.
Zheng C.
[J]. IEICE Transactions on Information and Systems, 2020, E103.D (07) : 1744 - 1752
[28] Speaker-Aware Multi-Task Learning for Automatic Speech Recognition
Pironkov, Gueorgui
Dupont, Stephane
Dutoit, Thierry
[J]. 2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 2900 - 2905
[29] MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION
Ravanelli, Mirco
Zhong, Jianyuan
Pascual, Santiago
Swietojanski, Pawel
Monteiro, Joao
Trmal, Jan
Bengio, Yoshua
[J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6989 - 6993
[30] Chinese Clinical Named Entity Recognition Based on Stroke ELMo and Multi-Task Learning
Luo L.
Yang Z.-H.
Song Y.-W.
Li N.
Lin H.-F.
[J]. Yang, Zhi-Hao (yangzh@dlut.edu.cn), 1943, Science Press (43): : 1943 - 1957

← 1 2 3 4 5 →