EXTENDED GRAPH TEMPORAL CLASSIFICATION FOR MULTI-SPEAKER END-TO-END ASR

被引:0
|
作者
Chang, Xuankai [1 ,2 ]
Moritz, Niko [1 ]
Hori, Takaaki [1 ]
Watanabe, Shinji [2 ]
Le Roux, Jonathan [1 ]
机构
[1] Mitsubishi Elect Res Labs MERL, Cambridge, MA 02139 USA
[2] Carnegie Mellon Univ, Language Technol Inst, Pittsburgh, PA 15213 USA
关键词
CTC; GTC; WFST; end-to-end ASR; multispeaker overlapped speech;
D O I
10.1109/ICASSP43922.2022.9747375
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Graph-based temporal classification (GTC), a generalized form of the connectionist temporal classification loss, was recently proposed to improve automatic speech recognition (ASR) systems using graph-based supervision. For example, GTC was first used to encode an N-best list of pseudo-label sequences into a graph for semi-supervised learning. In this paper, we propose an extension of GTC to model the posteriors of both labels and label transitions by a neural network, which can be applied to a wider range of tasks. As an example application, we use the extended GTC (GTC-e) for the multi-speaker speech recognition task. The transcriptions and speaker information of multi-speaker speech are represented by a graph, where the speaker information is associated with the transitions and ASR outputs with the nodes. Using GTC-e, multi-speaker ASR modelling becomes very similar to single-speaker ASR modeling, in that tokens by multiple speakers are recognized as a single merged sequence in chronological order. For evaluation, we perform experiments on a simulated multi-speaker speech dataset derived from LibriSpeech, obtaining promising results with performance close to classical benchmarks for the task.
引用
收藏
页码:7322 / 7326
页数:5
相关论文
共 50 条
  • [1] END-TO-END MULTI-SPEAKER ASR WITH INDEPENDENT VECTOR ANALYSIS
    Scheibler, Robin
    Zhang, Wangyou
    Chang, Xuankai
    Watanabe, Shinji
    Qian, Yanmin
    [J]. 2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 496 - 501
  • [2] END-TO-END MONAURAL MULTI-SPEAKER ASR SYSTEM WITHOUT PRETRAINING
    Chang, Xuankai
    Qian, Yanmin
    Yu, Kai
    Watanabe, Shinji
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6256 - 6260
  • [3] END-TO-END MULTI-SPEAKER SPEECH RECOGNITION
    Settle, Shane
    Le Roux, Jonathan
    Hori, Takaaki
    Watanabe, Shinji
    Hershey, John R.
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4819 - 4823
  • [4] End-to-End Multilingual Multi-Speaker Speech Recognition
    Seki, Hiroshi
    Hori, Takaaki
    Watanabe, Shinji
    Le Roux, Jonathan
    Hershey, John R.
    [J]. INTERSPEECH 2019, 2019, : 3755 - 3759
  • [5] END-TO-END MULTI-SPEAKER SPEECH RECOGNITION WITH TRANSFORMER
    Chang, Xuankai
    Zhang, Wangyou
    Qian, Yanmin
    Le Roux, Jonathan
    Watanabe, Shinji
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6134 - 6138
  • [6] Can Speaker Augmentation Improve Multi-Speaker End-to-End TTS?
    Cooper, Erica
    Lai, Cheng-, I
    Yasuda, Yusuke
    Yamagishi, Junichi
    [J]. INTERSPEECH 2020, 2020, : 3979 - 3983
  • [7] A Purely End-to-end System for Multi-speaker Speech Recognition
    Seki, Hiroshi
    Hori, Takaaki
    Watanabe, Shinji
    Le Roux, Jonathan
    Hershey, John R.
    [J]. PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, 2018, : 2620 - 2630
  • [8] End-to-End Multi-Speaker Speech Recognition using Speaker Embeddings and Transfer Learning
    Denisov, Pavel
    Ngoc Thang Vu
    [J]. INTERSPEECH 2019, 2019, : 4425 - 4429
  • [9] Real-time End-to-End Monaural Multi-speaker Speech Recognition
    Li, Song
    Ouyang, Beibei
    Tong, Fuchuan
    Liao, Dexin
    Li, Lin
    Hong, Qingyang
    [J]. INTERSPEECH 2021, 2021, : 3750 - 3754
  • [10] End-to-End Speaker-Attributed ASR with Transformer
    Kanda, Naoyuki
    Ye, Guoli
    Gaur, Yashesh
    Wang, Xiaofei
    Meng, Zhong
    Chen, Zhuo
    Yoshioka, Takuya
    [J]. INTERSPEECH 2021, 2021, : 4413 - 4417