SFA: Searching faster architectures for end-to-end automatic speech recognition models

被引:3
|
作者
Liu, Yukun
Li, Ta
Zhang, Pengyuan [1 ]
Yan, Yonghong
机构
[1] Chinese Acad Sci, Inst Acoust, Key Lab Speech Acoust & Content Understanding, Beijing, Peoples R China
来源
关键词
Automatic speech recognition; Model acceleration; Neural architecture search; ATTENTION;
D O I
10.1016/j.csl.2023.101500
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently End-to-end (E2E) Automatic Speech Recognition (ASR) has been widely used due to its advantages over the hybrid method. Even though existing E2E ASR models have achieved impressive performance, they usually take a large model size and suffer from a slow inference speed in real-world applications. To obtain faster models for E2E ASR, we propose searching faster architectures with the help of neural architecture search (NAS) in this paper, named SFA. SFA consists of one search space that contains a set of candidate architectures and one search algorithm responsible for searching the optimal architecture from the search space. On one hand, SFA designs a topology-fused search space to integrate different topologies of existing architectures (e.g. Transformer, Conformer) and explore more brand-new ones. On the other hand, combined with the training criterion of E2E ASR, SFA develops a speed-aware differentiable search algorithm to search faster architectures according to target hardware devices. Additionally, a connectionist temporal classification based progressive search algorithm is proposed to reduce the difficulty of the architecture search and obtain better performance. On two commonly-used Mandarin datasets, SFA can effectively improve the inference speed of existing E2E ASR models with comparable performance and achieve at most 2.46 x/ 1.98 x CPU/GPU speedup than the best human-designed baselines.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Unidirectional Neural Network Architectures for End-to-End Automatic Speech Recognition
    Moritz, Niko
    Hori, Takaaki
    Le Roux, Jonathan
    [J]. INTERSPEECH 2019, 2019, : 76 - 80
  • [2] An Overview of End-to-End Automatic Speech Recognition
    Wang, Dong
    Wang, Xiaodong
    Lv, Shaohe
    [J]. SYMMETRY-BASEL, 2019, 11 (08):
  • [3] An Investigation Into On-device Personalization of End-to-end Automatic Speech Recognition Models
    Sim, Khe Chai
    Zadrazil, Petr
    Beaufays, Francoise
    [J]. INTERSPEECH 2019, 2019, : 774 - 778
  • [4] LWMD: A Comprehensive Compression Platform for End-to-End Automatic Speech Recognition Models
    Liu, Yukun
    Li, Ta
    Zhang, Pengyuan
    Yan, Yonghong
    [J]. APPLIED SCIENCES-BASEL, 2023, 13 (03):
  • [5] End-To-End deep neural models for Automatic Speech Recognition for Polish Language
    Pondel-Sycz, Karolina
    Pietrzak, Agnieszka Paula
    Szymla, Julia
    [J]. INTERNATIONAL JOURNAL OF ELECTRONICS AND TELECOMMUNICATIONS, 2024, 70 (02) : 315 - 321
  • [6] INCREMENTAL LEARNING FOR END-TO-END AUTOMATIC SPEECH RECOGNITION
    Fu, Li
    Li, Xiaoxiao
    Zi, Libo
    Zhang, Zhengchen
    Wu, Youzheng
    He, Xiaodong
    Zhou, Bowen
    [J]. 2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 320 - 327
  • [7] Recent Advances in End-to-End Automatic Speech Recognition
    Li, Jinyu
    [J]. APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2022, 11 (01)
  • [8] Inverted Alignments for End-to-End Automatic Speech Recognition
    Doetsch, Patrick
    Hannemann, Mirko
    Schluter, Ralf
    Ney, Hermann
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2017, 11 (08) : 1265 - 1273
  • [9] Evaluating the Vulnerability of End-to-End Automatic Speech Recognition Models To Membership Inference Attacks
    Shah, Muhammad A.
    Szurley, Joseph
    Mueller, Markus
    Mouchtaris, Athanasios
    Droppo, Jasha
    [J]. INTERSPEECH 2021, 2021, : 891 - 895
  • [10] End-to-End Neural Segmental Models for Speech Recognition
    Tang, Hao
    Lu, Liang
    Kong, Lingpeng
    Gimpel, Kevin
    Livescu, Karen
    Dyer, Chris
    Smith, Noah A.
    Renals, Steve
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2017, 11 (08) : 1254 - 1264