LATENCY-CONTROLLED NEURAL ARCHITECTURE SEARCH FOR STREAMING SPEECH RECOGNITION

被引:0
|
作者
He, Liqiang [1 ]
Feng, Shulin [1 ]
Su, Dan [1 ]
Yu, Dong [2 ]
机构
[1] Tencent AI Lab, Shenzhen, Peoples R China
[2] Tencent AI Lab, Bellevue, WA USA
关键词
neural architecture search; low latency; streaming/online speech recognition;
D O I
10.1109/ASRU51503.2021.9688058
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Neural architecture search (NAS) has attracted much attention and has been explored for automatic speech recognition (ASR). In this work, we focus on streaming ASR scenarios and propose the latency-controlled NAS for acoustic modeling. First, based on the vanilla neural architecture, normal cells are altered to causal cells to control the total latency of the architecture. Second, a revised operation space with a smaller receptive field is proposed to generate the final architecture with low latency. Extensive experiments show that: 1) Based on the proposed neural architecture, the neural networks with a medium latency of 550ms (millisecond) and a low latency of 190ms can be learned in the vanilla and revised operation space respectively. 2) For the low latency setting, the evaluation network can achieve more than 19% (average on the four test sets) relative improvements compared with the hybrid CLDNN baseline, on a 10k-hour large-scale dataset.
引用
收藏
页码:62 / 67
页数:6
相关论文
共 50 条
  • [1] IMPROVING LATENCY-CONTROLLED BLSTM ACOUSTIC MODELS FOR ONLINE SPEECH RECOGNITION
    Xue, Shaofei
    Yan, Zhijie
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5340 - 5344
  • [2] NEURAL ARCHITECTURE SEARCH FOR SPEECH EMOTION RECOGNITION
    Wu, Xixin
    Hu, Shoukang
    Wu, Zhiyong
    Liu, Xunying
    Meng, Helen
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6902 - 6906
  • [3] EmotionNAS: Two-stream Neural Architecture Search for Speech Emotion Recognition
    Sun, Haiyang
    Lian, Zheng
    Liu, Bin
    Li, Ying
    Sun, Licai
    Cai, Cong
    Tao, Jianhua
    Wang, Meng
    Cheng, Yuan
    INTERSPEECH 2023, 2023, : 3597 - 3601
  • [4] NEURAL LATTICE SEARCH FOR SPEECH RECOGNITION
    Ma, Rao
    Li, Hao
    Liu, Qi
    Chen, Lu
    Yu, Kai
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7794 - 7798
  • [5] Low-latency transformer model for streaming automatic speech recognition
    Miao, Haoran
    Cheng, Gaofeng
    Zhang, Pengyuan
    ELECTRONICS LETTERS, 2022, 58 (01) : 44 - 46
  • [6] Multihardware Adaptive Latency Prediction for Neural Architecture Search
    Lin, Chengmin
    Yang, Pengfei
    Wang, Quan
    Guo, Yitong
    Wang, Zhenyi
    IEEE INTERNET OF THINGS JOURNAL, 2025, 12 (03): : 3385 - 3398
  • [7] MULTILINGUAL SPEECH EMOTION RECOGNITION WITH MULTI-GATING MECHANISM AND NEURAL ARCHITECTURE SEARCH
    Wang, Zihan
    Meng, Qi
    Lan, HaiFeng
    Zhang, XinRui
    Guo, KeHao
    Gupta, Akshat
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 806 - 813
  • [8] Lightweight End-to-End Architecture for Streaming Speech Recognition
    Yang S.
    Li X.
    Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2023, 36 (03): : 268 - 279
  • [9] Evolved Speech-Transformer: Applying Neural Architecture Search to End-to-End Automatic Speech Recognition
    Kim, Jihwan
    Wang, Jisung
    Kim, Sangki
    Lee, Yeha
    INTERSPEECH 2020, 2020, : 1788 - 1792
  • [10] Low Latency End-to-End Streaming Speech Recognition with a Scout Network
    Wang, Chengyi
    Wu, Yu
    Lu, Liang
    Liu, Shujie
    Li, Jinyu
    Ye, Guoli
    Zhou, Ming
    INTERSPEECH 2020, 2020, : 2112 - 2116