SFA: Searching faster architectures for end-to-end automatic speech recognition models

被引:3
|
作者
Liu, Yukun
Li, Ta
Zhang, Pengyuan [1 ]
Yan, Yonghong
机构
[1] Chinese Acad Sci, Inst Acoust, Key Lab Speech Acoust & Content Understanding, Beijing, Peoples R China
来源
关键词
Automatic speech recognition; Model acceleration; Neural architecture search; ATTENTION;
D O I
10.1016/j.csl.2023.101500
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently End-to-end (E2E) Automatic Speech Recognition (ASR) has been widely used due to its advantages over the hybrid method. Even though existing E2E ASR models have achieved impressive performance, they usually take a large model size and suffer from a slow inference speed in real-world applications. To obtain faster models for E2E ASR, we propose searching faster architectures with the help of neural architecture search (NAS) in this paper, named SFA. SFA consists of one search space that contains a set of candidate architectures and one search algorithm responsible for searching the optimal architecture from the search space. On one hand, SFA designs a topology-fused search space to integrate different topologies of existing architectures (e.g. Transformer, Conformer) and explore more brand-new ones. On the other hand, combined with the training criterion of E2E ASR, SFA develops a speed-aware differentiable search algorithm to search faster architectures according to target hardware devices. Additionally, a connectionist temporal classification based progressive search algorithm is proposed to reduce the difficulty of the architecture search and obtain better performance. On two commonly-used Mandarin datasets, SFA can effectively improve the inference speed of existing E2E ASR models with comparable performance and achieve at most 2.46 x/ 1.98 x CPU/GPU speedup than the best human-designed baselines.
引用
收藏
页数:14
相关论文
共 50 条
  • [21] Online Continual Learning of End-to-End Speech Recognition Models
    Yang, Muqiao
    Lane, Ian
    Watanabe, Shinji
    [J]. INTERSPEECH 2022, 2022, : 2668 - 2672
  • [22] Conformer Parrotron: a Faster and Stronger End-to-end Speech Conversion and Recognition Model for Atypical Speech
    Chen, Zhehuai
    Ramabhadran, Bhuvana
    Biadsy, Fadi
    Zhang, Xia
    Chen, Youzheng
    Jiang, Liyang
    Chu, Fang
    Doshi, Rohan
    Moreno, Pedro J.
    [J]. INTERSPEECH 2021, 2021, : 4828 - 4832
  • [23] Analyzing Hidden Representations in End-to-End Automatic Speech Recognition Systems
    Belinkov, Yonatan
    Glass, James
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [24] Analyzing Phonetic and Graphemic Representations in End-to-End Automatic Speech Recognition
    Belinkov, Yonatan
    Ali, Ahmed
    Glass, James
    [J]. INTERSPEECH 2019, 2019, : 81 - 85
  • [25] Quaternion Convolutional Neural Networks for End-to-End Automatic Speech Recognition
    Parcollet, Titouan
    Zhang, Ying
    Morchid, Mohamed
    Trabelsi, Chiheb
    Linares, Georges
    De Mori, Renato
    Bengio, Yoshua
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 22 - 26
  • [26] Controlling the Noise Robustness of End-to-End Automatic Speech Recognition Systems
    Moeller, Matthias
    Twiefel, Johannes
    Weber, Cornelius
    Wermter, Stefan
    [J]. 2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [27] Insertion-Based Modeling for End-to-End Automatic Speech Recognition
    Fujita, Yuya
    Watanabe, Shinji
    Omachi, Motoi
    Chang, Xuankai
    [J]. INTERSPEECH 2020, 2020, : 3660 - 3664
  • [28] LEARNING A SUBWORD INVENTORY JOINTLY WITH END-TO-END AUTOMATIC SPEECH RECOGNITION
    Drexler, Jennifer
    Glass, James
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6439 - 6443
  • [29] Towards end-to-end training of automatic speech recognition for nigerian pidgin
    Ajisafe, Daniel
    Adegboro, Oluwabukola
    Oduntan, Esther
    Arulogun, Tayo
    [J]. arXiv, 2020,
  • [30] Integrated End-to-End Automatic Speech Recognition for Languages for Agglutinative Languages
    Bekarystankyzy, Akbayan
    Mamyrbayev, Orken
    Anarbekova, Tolganay
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (06)