Efficient conformer-based speech recognition with linear attention

被引:0
|
作者
Li, Shengqiang [1 ]
Xu, Menglong [1 ]
Zhang, Xiao-Lei [1 ]
机构
[1] Northwestern Polytech Univ, CIAIC, Sch Marine Sci & Technol, Xian, Peoples R China
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recently, conformer-based end-to-end automatic speech recognition, which outperforms recurrent neural network based ones, has received much attention. Although the parallel computing of conformer is more efficient than recurrent neural networks, the computational complexity of its dot-product self-attention is quadratic with respect to the length of the input feature. To reduce the computational complexity of the self-attention layer, we propose multi-head linear self-attention for the self-attention layer, which reduces its computational complexity to linear order. In addition, we propose to factorize the feed forward module of the conformer by low-rank matrix factorization, which successfully reduces the number of the parameters by approximate 50% with little performance loss. The proposed model, named linear attention based conformer (LAC), can be trained and inferenced jointly with the connectionist temporal classification objective, which further improves the performance of LAC. To evaluate the effectiveness of LAC, we conduct experiments on the AISHELL-1 and LibriSpeech corpora. Results show that the proposed LAC achieves better performance than 7 recently proposed speech recognition models, and is competitive with the state-of-the-art conformer. Meanwhile, the proposed LAC has a number of parameters of only 50% over the conformer with faster training speed than the latter.
引用
收藏
页码:448 / 453
页数:6
相关论文
共 50 条
  • [1] CONFORMER-BASED SPEECH RECOGNITION WITH LINEAR NYSTROM ATTENTION AND ROTARY POSITION EMBEDDING
    Samarakoon, Lahiru
    Leung, Tsun-Yat
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8012 - 8016
  • [2] Efficient Conformer-Based CTC Model for Intelligent Cockpit Speech Recognition
    Guo, Hanzhi
    Chen, Yunshu
    Xie, Xukang
    Xu, Gaopeng
    Guo, Wei
    [J]. 2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 522 - 526
  • [3] Analysis of Self-Attention Head Diversity for Conformer-based Automatic Speech Recognition
    Audhkhasi, Kartik
    Huang, Yinghui
    Ramabhadran, Bhuvana
    Moreno, Pedro J.
    [J]. INTERSPEECH 2022, 2022, : 1026 - 1030
  • [4] A Robust Conformer-Based Speech Recognition Model for Mandarin Air Traffic Control
    Jiang, Peiyuan
    Pan, Weijun
    Zhang, Jian
    Wang, Teng
    Huang, Junxiang
    [J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 77 (01): : 911 - 940
  • [5] Conformer-based End-to-end Speech Recognition With Rotary Position Embedding
    Li, Shengqiang
    Xu, Menglong
    Zhang, Xiao-Lei
    [J]. 2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 443 - 447
  • [6] CMGAN: Conformer-based Metric GAN for Speech Enhancement
    Cao, Ruizhe
    Abdulatif, Sherif
    Yang, Bin
    [J]. INTERSPEECH 2022, 2022, : 936 - 940
  • [7] EFFICIENT CONFORMER: PROGRESSIVE DOWNSAMPLING AND GROUPED ATTENTION FOR AUTOMATIC SPEECH RECOGNITION
    Burchi, Maxime
    Vielzeuf, Valentin
    [J]. 2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 8 - 15
  • [8] Universal and accent-discriminative encoders for conformer-based accent-invariant speech recognition
    Wang X.
    Long Y.
    Xu D.
    [J]. International Journal of Speech Technology, 2022, 25 (4) : 987 - 995
  • [9] Enhanced Conformer-Based Speech Recognition via Model Fusion and Adaptive Decoding with Dynamic Rescoring
    Geng, Junhao
    Jia, Dongyao
    He, Zihao
    Wu, Nengkai
    Li, Ziqi
    [J]. Applied Sciences (Switzerland), 2024, 14 (24):
  • [10] CMGAN: Conformer-Based Metric-GAN for Monaural Speech Enhancement
    Abdulatif, Sherif
    Cao, Ruizhe
    Yang, Bin
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 2477 - 2493