End-to-End Multi-Speaker Speech Recognition using Speaker Embeddings and Transfer Learning

被引：12

作者：

Denisov, Pavel ^{[1
]}

Ngoc Thang Vu ^{[1
]}

机构：

[1] Univ Stuttgart, Inst Nat Language Proc IMS, Stuttgart, Germany

来源：

INTERSPEECH 2019 | 2019年

关键词：

end-to-end asr; overlapped speech;

D O I：

10.21437/Interspeech.2019-1130

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

This paper presents our latest investigation on end-to-end automatic speech recognition (ASR) for overlapped speech. We propose to train an end-to-end system conditioned on speaker embeddings and further improved by transfer learning from clean speech. This proposed framework does not require any parallel non-overlapped speech materials and is independent of the number of speakers. Our experimental results on overlapped speech datasets show that joint conditioning on speaker embeddings and transfer learning significantly improves the ASR performance.

引用

页码：4425 / 4429

页数：5

共 50 条

[41] Speech Rhythm-Based Speaker Embeddings Extraction from Phonemes and Phoneme Duration for Multi-Speaker Speech Synthesis
Fujita, Kenichi
Ando, Atsushi
Ijima, Yusuke
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2024, E107D (01) : 93 - 104
[42] Speech Emotion Recognition Using Transfer Learning: Integration of Advanced Speaker Embeddings and Image Recognition Models
Jakubec, Maros
Lieskovska, Eva
Jarina, Roman
Spisiak, Michal
Kasak, Peter
Applied Sciences (Switzerland), 2024, 14 (21):
[43] Speaker-Attributed Training for Multi-Speaker Speech Recognition Using Multi-Stage Encoders and Attention-Weighted Speaker Embedding
Kim, Minsoo
Jang, Gil-Jin
Applied Sciences (Switzerland), 2024, 14 (18):
[44] Robust End-to-End Speaker Verification Using EEG
Han, Yan
Krishna, Gautam
Tran, Co
Carnahan, Mason
Tewfik, Ahmed H.
28TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2020), 2021, : 1170 - 1174
[45] Gammatonegram representation for end-to-end dysarthric speech processing tasks: speech recognition, speaker identification, and intelligibility assessment
Aref Farhadipour
Hadi Veisi
Iran Journal of Computer Science, 2024, 7 (2) : 311 - 324
[46] Cross-speaker Emotion Transfer Based On Prosody Compensation for End-to-End Speech Synthesis
Li, Tao
Wang, Xinsheng
Xie, Qicong
Wang, Zhichao
Jiang, Mingqi
Xie, Lei
INTERSPEECH 2022, 2022, : 5498 - 5502
[47] END-TO-END OVERLAPPED SPEECH DETECTION AND SPEAKER COUNTING WITH RAW WAVEFORM
Zhang, Wangyou
Sun, Man
Wang, Lan
Qian, Yanmin
2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 660 - 666
[48] DIVE: END-TO-END SPEECH DIARIZATION VIA ITERATIVE SPEAKER EMBEDDING
Zeghidour, Neil
Teboul, Olivier
Grangier, David
2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 702 - 709
[49] End-to-End Chinese Speaker Identification
Yu, Dian
Zhou, Ben
Yu, Dong
NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 2274 - 2285
[50] END-TO-END SPEAKER DIARIZATION CONDITIONED ON SPEECH ACTIVITY AND OVERLAP DETECTION
Takashima, Yuki
Fujita, Yusuke
Watanabe, Shinji
Horiguchi, Shota
Garcia, Paola
Nagamatsu, Kenji
2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 849 - 856

← 1 2 3 4 5 →