Iterative Compression of End-to-End ASR Model using AutoML

被引:3
|
作者
Mehrotra, Abhinav [1 ]
Dudziak, Lukasz [1 ]
Yeo, Jinsu [2 ]
Lee, Young-yoon [2 ]
Vipperla, Ravichander [1 ]
Abdelfattah, Mohamed S. [1 ]
Bhattacharya, Sourav [1 ]
Ishtiaq, Samin [1 ]
Ramos, Alberto Gil C. P. [1 ]
Lee, SangJeong [2 ]
Kim, Daehyun [2 ]
Lane, Nicholas D. [1 ,3 ]
机构
[1] Samsung AI Ctr, Cambridge, England
[2] Samsung Res, On Device Lab, Seoul, South Korea
[3] Univ Cambridge, Cambridge, England
来源
关键词
ASR Compression; AutoML; Reinforcement Learning;
D O I
10.21437/Interspeech.2020-1894
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Increasing demand for on-device Automatic Speech Recognition (ASR) systems has resulted in renewed interests in developing automatic model compression techniques. Past research have shown that AutoML-based Low Rank Factorization (LRF) technique, when applied to an end-to-end Encoder-Attention-Decoder style ASR model, can achieve a speedup of up to 3.7x, outperforming laborious manual rank-selection approaches. However, we show that current AutoML-based search techniques only work up to a certain compression level, beyond which they fail to produce compressed models with acceptable word error rates (WER). In this work, we propose an iterative AutoML-based LRF approach that achieves over 5x compression without degrading the WER, thereby advancing the state-of-the-art in ASR compression.
引用
收藏
页码:3361 / 3365
页数:5
相关论文
共 50 条
  • [21] TOWARDS FAST AND ACCURATE STREAMING END-TO-END ASR
    Li, Bo
    Chang, Shuo-yiin
    Sainath, Tara N.
    Pang, Ruoming
    He, Yanzhang
    Strohman, Trevor
    Wu, Yonghui
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6069 - 6073
  • [22] End-to-End Speech Emotion Recognition Combined with Acoustic-to-Word ASR Model
    Feng, Han
    Ueno, Sei
    Kawahara, Tatsuya
    [J]. INTERSPEECH 2020, 2020, : 501 - 505
  • [23] TRANSFER LEARNING OF LANGUAGE-INDEPENDENT END-TO-END ASR WITH LANGUAGE MODEL FUSION
    Inaguma, Hirofumi
    Cho, Jaejin
    Baskar, Murali Karthick
    Kawahara, Tatsuya
    Watanabe, Shinji
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6096 - 6100
  • [24] Multi-Modal Data Augmentation for End-to-End ASR
    Renduchintala, Adithya
    Ding, Shuoyang
    Wiesner, Matthew
    Watanabe, Shinji
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2394 - 2398
  • [25] Dysarthric Speech Augmentation Using Prosodic Transformation and Masking for Subword End-to-end ASR
    Soleymanpour, Mohammad
    Johnson, Michael T.
    Berry, Jeffrey
    [J]. 2021 INTERNATIONAL CONFERENCE ON SPEECH TECHNOLOGY AND HUMAN-COMPUTER DIALOGUE (SPED), 2021, : 42 - 46
  • [26] Auxiliary feature based adaptation of end-to-end ASR systems
    Delcroix, Marc
    Watanabe, Shinji
    Ogawa, Atsunori
    Karita, Shigeki
    Nakatani, Tomohiro
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2444 - 2448
  • [27] Streaming End-to-End ASR Using CTC Decoder and DRA for Linguistic Information Substitution
    Takagi, Tatsunari
    Ogawa, Atsunori
    Kitaoka, Norihide
    Wakabayashi, Yukoh
    [J]. 2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 1779 - 1783
  • [28] Comparison and Analysis of New Curriculum Criteria for End-to-End ASR
    Karakasidis, Georgios
    Grosz, Tamas
    Kurimo, Mikko
    [J]. INTERSPEECH 2022, 2022, : 66 - 70
  • [29] End-to-end ASR to jointly predict transcriptions and linguistic annotations
    Omachi, Motoi
    Fujita, Yuya
    Watanabe, Shinji
    Wiesner, Matthew
    [J]. 2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 1861 - 1871
  • [30] Speech Representation Learning for Emotion Recognition Using End-to-End ASR with Factorized Adaptation
    Yeh, Sung-Lin
    Lin, Yun-Shao
    Lee, Chi-Chun
    [J]. INTERSPEECH 2020, 2020, : 536 - 540