SFA: Searching faster architectures for end-to-end automatic speech recognition models

被引:3
|
作者
Liu, Yukun
Li, Ta
Zhang, Pengyuan [1 ]
Yan, Yonghong
机构
[1] Chinese Acad Sci, Inst Acoust, Key Lab Speech Acoust & Content Understanding, Beijing, Peoples R China
来源
关键词
Automatic speech recognition; Model acceleration; Neural architecture search; ATTENTION;
D O I
10.1016/j.csl.2023.101500
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently End-to-end (E2E) Automatic Speech Recognition (ASR) has been widely used due to its advantages over the hybrid method. Even though existing E2E ASR models have achieved impressive performance, they usually take a large model size and suffer from a slow inference speed in real-world applications. To obtain faster models for E2E ASR, we propose searching faster architectures with the help of neural architecture search (NAS) in this paper, named SFA. SFA consists of one search space that contains a set of candidate architectures and one search algorithm responsible for searching the optimal architecture from the search space. On one hand, SFA designs a topology-fused search space to integrate different topologies of existing architectures (e.g. Transformer, Conformer) and explore more brand-new ones. On the other hand, combined with the training criterion of E2E ASR, SFA develops a speed-aware differentiable search algorithm to search faster architectures according to target hardware devices. Additionally, a connectionist temporal classification based progressive search algorithm is proposed to reduce the difficulty of the architecture search and obtain better performance. On two commonly-used Mandarin datasets, SFA can effectively improve the inference speed of existing E2E ASR models with comparable performance and achieve at most 2.46 x/ 1.98 x CPU/GPU speedup than the best human-designed baselines.
引用
收藏
页数:14
相关论文
共 50 条
  • [31] A Transformer-Based End-to-End Automatic Speech Recognition Algorithm
    Dong, Fang
    Qian, Yiyang
    Wang, Tianlei
    Liu, Peng
    Cao, Jiuwen
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2023, 30 : 1592 - 1596
  • [32] Dealing with Unknowns in Continual Learning for End-to-end Automatic Speech Recognition
    Sustek, Martin
    Sadhu, Samik
    Hermansky, Hynek
    [J]. INTERSPEECH 2022, 2022, : 1046 - 1050
  • [33] End-to-End Speech Recognition in Russian
    Markovnikov, Nikita
    Kipyatkova, Irina
    Lyakso, Elena
    [J]. SPEECH AND COMPUTER (SPECOM 2018), 2018, 11096 : 377 - 386
  • [34] END-TO-END MULTIMODAL SPEECH RECOGNITION
    Palaskar, Shruti
    Sanabria, Ramon
    Metze, Florian
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5774 - 5778
  • [35] Overview of end-to-end speech recognition
    Wang, Song
    Li, Guanyu
    [J]. 2018 INTERNATIONAL SYMPOSIUM ON POWER ELECTRONICS AND CONTROL ENGINEERING (ISPECE 2018), 2019, 1187
  • [36] End-to-end Accented Speech Recognition
    Viglino, Thibault
    Motlicek, Petr
    Cernak, Milos
    [J]. INTERSPEECH 2019, 2019, : 2140 - 2144
  • [37] END-TO-END AUDIOVISUAL SPEECH RECOGNITION
    Petridis, Stavros
    Stafylakis, Themos
    Ma, Pingchuan
    Cai, Feipeng
    Tzimiropoulos, Georgios
    Pantic, Maja
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6548 - 6552
  • [38] Multichannel End-to-end Speech Recognition
    Ochiai, Tsubasa
    Watanabe, Shinji
    Hori, Takaaki
    Hershey, John R.
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
  • [39] END-TO-END ANCHORED SPEECH RECOGNITION
    Wang, Yiming
    Fan, Xing
    Chen, I-Fan
    Liu, Yuzong
    Chen, Tongfei
    Hoffmeister, Bjorn
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7090 - 7094
  • [40] A COMPARISON OF END-TO-END MODELS FOR LONG-FORM SPEECH RECOGNITION
    Chiu, Chung-Cheng
    Han, Wei
    Zhang, Yu
    Pang, Ruoming
    Kishchenko, Sergey
    Nguyen, Patrick
    Narayanan, Arun
    Liao, Hank
    Zhang, Shuyuan
    Kannan, Anjuli
    Prabhavalkar, Rohit
    Chen, Zhifeng
    Sainath, Tara
    Wu, Yonghui
    [J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 889 - 896