End-to-End Multi-Speaker Speech Recognition using Speaker Embeddings and Transfer Learning

被引:12
|
作者
Denisov, Pavel [1 ]
Ngoc Thang Vu [1 ]
机构
[1] Univ Stuttgart, Inst Nat Language Proc IMS, Stuttgart, Germany
来源
关键词
end-to-end asr; overlapped speech;
D O I
10.21437/Interspeech.2019-1130
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
This paper presents our latest investigation on end-to-end automatic speech recognition (ASR) for overlapped speech. We propose to train an end-to-end system conditioned on speaker embeddings and further improved by transfer learning from clean speech. This proposed framework does not require any parallel non-overlapped speech materials and is independent of the number of speakers. Our experimental results on overlapped speech datasets show that joint conditioning on speaker embeddings and transfer learning significantly improves the ASR performance.
引用
下载
收藏
页码:4425 / 4429
页数:5
相关论文
共 50 条
  • [1] END-TO-END MULTI-SPEAKER SPEECH RECOGNITION
    Settle, Shane
    Le Roux, Jonathan
    Hori, Takaaki
    Watanabe, Shinji
    Hershey, John R.
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4819 - 4823
  • [2] End-to-End Multilingual Multi-Speaker Speech Recognition
    Seki, Hiroshi
    Hori, Takaaki
    Watanabe, Shinji
    Le Roux, Jonathan
    Hershey, John R.
    INTERSPEECH 2019, 2019, : 3755 - 3759
  • [3] END-TO-END MULTI-SPEAKER SPEECH RECOGNITION WITH TRANSFORMER
    Chang, Xuankai
    Zhang, Wangyou
    Qian, Yanmin
    Le Roux, Jonathan
    Watanabe, Shinji
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6134 - 6138
  • [4] A Purely End-to-end System for Multi-speaker Speech Recognition
    Seki, Hiroshi
    Hori, Takaaki
    Watanabe, Shinji
    Le Roux, Jonathan
    Hershey, John R.
    PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, 2018, : 2620 - 2630
  • [5] MIMO-SPEECH: END-TO-END MULTI-CHANNEL MULTI-SPEAKER SPEECH RECOGNITION
    Chang, Xuankai
    Zhang, Wangyou
    Qian, Yanmin
    Le Roux, Jonathan
    Watanabe, Shinji
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 237 - 244
  • [6] Real-time End-to-End Monaural Multi-speaker Speech Recognition
    Li, Song
    Ouyang, Beibei
    Tong, Fuchuan
    Liao, Dexin
    Li, Lin
    Hong, Qingyang
    INTERSPEECH 2021, 2021, : 3750 - 3754
  • [7] Can Speaker Augmentation Improve Multi-Speaker End-to-End TTS?
    Cooper, Erica
    Lai, Cheng-, I
    Yasuda, Yusuke
    Yamagishi, Junichi
    INTERSPEECH 2020, 2020, : 3979 - 3983
  • [8] SPEAKER-AWARE TRAINING OF ATTENTION-BASED END-TO-END SPEECH RECOGNITION USING NEURAL SPEAKER EMBEDDINGS
    Rouhe, Aku
    Kaseva, Tuomas
    Kurimo, Mikko
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7064 - 7068
  • [9] FOCUSING ON ATTENTION: PROSODY TRANSFER AND ADAPTATIVE OPTIMIZATION STRATEGY FOR MULTI-SPEAKER END-TO-END SPEECH SYNTHESIS
    Fu, Ruibo
    Tao, Jianhua
    Wen, Zhengqi
    Yi, Jiangyan
    Wang, Tao
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6709 - 6713
  • [10] END-TO-END MULTI-SPEAKER ASR WITH INDEPENDENT VECTOR ANALYSIS
    Scheibler, Robin
    Zhang, Wangyou
    Chang, Xuankai
    Watanabe, Shinji
    Qian, Yanmin
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 496 - 501