Cascaded encoders for fine-tuning ASR models on overlapped speech

被引:0
|
作者
Rose, Richard [1 ]
Chang, Oscar [1 ]
Siohan, Olivier [1 ]
机构
[1] Google Inc, New York, NY 10011 USA
来源
关键词
multi-talker speech recognition;
D O I
10.21437/Interspeech.2023-1952
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Multi-talker automatic speech recognition (MT-ASR) has been shown to improve ASR performance on speech containing overlapping utterances from more than one speaker. While MT-ASR models have typically been trained from scratch using simulated overlapping speech datasets, there is generally an underlying goal that these models also obtain state of the art performance on single speaker utterances as well. This implies that they must be competitive with the best available fine-tuned speech models that have been trained using massive datasets collected from a wide variety of task domains. This paper presents an MT-ASR model formed by combining a well-trained foundation model with a multi-talker mask model in a cascaded RNN-T encoder configuration. Experimental results show that the cascade configuration provides improved WER on overlapping speech utterances with respect to a baseline multi-talker model with minimal impact on the performance achievable by the foundation model on non-overlapping utterances.
引用
收藏
页码:3457 / 3461
页数:5
相关论文
共 50 条
  • [41] PETALS: Collaborative Inference and Fine-tuning of Large Models
    Borzunov, Alexander
    Baranchuk, Dmitry
    Dettmers, Tim
    Ryabinin, Max
    Belkada, Younes
    Chumachenko, Artem
    Samygin, Pavel
    Raffel, Colin
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-DEMO 2023, VOL 3, 2023, : 558 - 568
  • [42] Relaxed fine-tuning in models with nonuniversal gaugino masses
    Abe, Hiroyuki
    Kobayashi, Tatsuo
    Omura, Yuji
    PHYSICAL REVIEW D, 2007, 76 (01):
  • [43] Stop Search With Acceptable Fine-Tuning in Susy Models
    Cici, Ali
    Un, Cem Salih
    Kirca, Zerrin
    PROCEEDINGS OF THE TURKISH PHYSICAL SOCIETY 32ND INTERNATIONAL PHYSICS CONGRESS (TPS32), 2017, 1815
  • [44] Fine-tuning language models to recognize semantic relations
    Roussinov, Dmitri
    Sharoff, Serge
    Puchnina, Nadezhda
    LANGUAGE RESOURCES AND EVALUATION, 2023, 57 (04) : 1463 - 1486
  • [45] Fine-tuning language models to recognize semantic relations
    Dmitri Roussinov
    Serge Sharoff
    Nadezhda Puchnina
    Language Resources and Evaluation, 2023, 57 : 1463 - 1486
  • [46] FedPFT: Federated Proxy Fine-Tuning of Foundation Models
    Peng, Zhaopeng
    Fan, Xiaoliang
    Chen, Yufan
    Wang, Zheng
    Pan, Shirui
    Wen, Chenglu
    Zhang, Ruisheng
    Wang, Cheng
    PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024, 2024, : 4806 - 4814
  • [47] Fine-Tuning Language Models with Just Forward Passes
    Malladi, Sadhika
    Gao, Tianyu
    Nichani, Eshaan
    Damian, Alex
    Lee, Jason D.
    Chen, Danqi
    Arora, Sanjeev
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [48] Natural fine-tuning
    Saleem, Anjum
    Medina, Luisa
    Kunststoffe International, 2019, 109 (04): : 45 - 48
  • [49] Fine-tuning tools
    Arianne Heinrichs
    Nature Reviews Molecular Cell Biology, 2006, 7 : 466 - 466
  • [50] Fine-tuning subtitle
    Waste Age, 2002, 33 (11): : 32 - 46