Cascaded encoders for fine-tuning ASR models on overlapped speech

被引:0
|
作者
Rose, Richard [1 ]
Chang, Oscar [1 ]
Siohan, Olivier [1 ]
机构
[1] Google Inc, New York, NY 10011 USA
来源
关键词
multi-talker speech recognition;
D O I
10.21437/Interspeech.2023-1952
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Multi-talker automatic speech recognition (MT-ASR) has been shown to improve ASR performance on speech containing overlapping utterances from more than one speaker. While MT-ASR models have typically been trained from scratch using simulated overlapping speech datasets, there is generally an underlying goal that these models also obtain state of the art performance on single speaker utterances as well. This implies that they must be competitive with the best available fine-tuned speech models that have been trained using massive datasets collected from a wide variety of task domains. This paper presents an MT-ASR model formed by combining a well-trained foundation model with a multi-talker mask model in a cascaded RNN-T encoder configuration. Experimental results show that the cascade configuration provides improved WER on overlapping speech utterances with respect to a baseline multi-talker model with minimal impact on the performance achievable by the foundation model on non-overlapping utterances.
引用
收藏
页码:3457 / 3461
页数:5
相关论文
共 50 条
  • [31] CASCADED ENCODERS FOR UNIFYING STREAMING AND NON-STREAMING ASR
    Narayanan, Arun
    Sainath, Tara N.
    Pang, Ruoming
    Yu, Jiahui
    Chiu, Chung-Cheng
    Prabhavalkar, Rohit
    Variani, Ehsan
    Strohman, Trevor
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5629 - 5633
  • [32] FINE-TUNING FINE CHEMICALS
    ROYSE, S
    EUROPEAN CHEMICAL NEWS, 1995, 64 (1693): : 28 - &
  • [33] Phased Instruction Fine-Tuning for Large Language Models
    Pang, Wei
    Zhou, Chuan
    Zhou, Xiao-Hua
    Wang, Xiaojie
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 5735 - 5748
  • [34] How fine can fine-tuning be? Learning efficient language models
    Radiya-Dixit, Evani
    Wang, Xin
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108 : 2435 - 2442
  • [35] Improve Performance of Fine-tuning Language Models with Prompting
    Yang, Zijian Gyozo
    Ligeti-Nagy, Noenn
    INFOCOMMUNICATIONS JOURNAL, 2023, 15 : 62 - 68
  • [36] Robust fine-tuning of zero-shot models
    Wortsman, Mitchell
    Ilharco, Gabriel
    Kim, Jong Wook
    Li, Mike
    Kornblith, Simon
    Roelofs, Rebecca
    Lopes, Raphael Gontijo
    Hajishirzi, Hannaneh
    Farhadi, Ali
    Namkoong, Hongseok
    Schmidt, Ludwig
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 7949 - 7961
  • [37] HackMentor: Fine-Tuning Large Language Models for Cybersecurity
    Zhang, Jie
    Wen, Hui
    Deng, Liting
    Xin, Mingfeng
    Li, Zhi
    Li, Lun
    Zhu, Hongsong
    Sun, Limin
    2023 IEEE 22ND INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS, TRUSTCOM, BIGDATASE, CSE, EUC, ISCI 2023, 2024, : 452 - 461
  • [38] CONVFIT: Conversational Fine-Tuning of Pretrained Language Models
    Vulic, Ivan
    Su, Pei-Hao
    Coope, Sam
    Gerz, Daniela
    Budzianowski, Pawel
    Casanueva, Inigo
    Mrksic, Nikola
    Wen, Tsung-Hsien
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 1151 - 1168
  • [39] Simultaneous paraphrasing and translation by fine-tuning Transformer models
    Chada, Rakesh
    NEURAL GENERATION AND TRANSLATION, 2020, : 198 - 203
  • [40] Enhancing Multimodal Emotion Recognition through ASR Error Compensation and LLM Fine-Tuning
    Kyung, Jehyun
    Heo, Serin
    Chang, Joon-Hyuk
    INTERSPEECH 2024, 2024, : 4683 - 4687