End-to-End Speaker Diarization for an Unknown Number of Speakers with Encoder-Decoder Based Attractors

被引：70

作者：

Horiguchi, Shota ^{[1
]}

Fujita, Yusuke ^{[1
]}

Watanabe, Shinji ^{[2
]}

Xue, Yawen ^{[1
]}

Nagamatsu, Kenji ^{[1
]}

机构：

[1] Hitachi Ltd, Tokyo, Japan

[2] Johns Hopkins Univ, Baltimore, MD 21218 USA

来源：

INTERSPEECH 2020 | 2020年

关键词：

speaker diarization; encoder-decoder; attractor calculation;

D O I：

10.21437/Interspeech.2020-1022

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

End-to-end speaker diarization for an unknown number of speakers is addressed in this paper. Recently proposed end-to-end speaker diarization outperformed conventional clustering-based speaker diarization, but it has one drawback: it is less flexible in terms of the number of speakers. This paper proposes a method for encoder-decoder based attractor calculation (EDA), which first generates a flexible number of attractors from a speech embedding sequence. Then, the generated multiple attractors are multiplied by the speech embedding sequence to produce the same number of speaker activities. The speech embedding sequence is extracted using the conventional self-attentive end-to-end neural speaker diarization (SA-EEND) network. In a two-speaker condition, our method achieved a 2.69% diarization error rate (DER) on simulated mixtures and a 8.07% DER on the two-speaker subset of CALLHOME, while vanilla SA-EEND attained 4.56% and 9.54 %, respectively. In unknown numbers of speakers conditions, our method attained a 15.29% DER on CALLHOME, while the x-vector-based clustering method achieved a 19.43% DER.

引用

页码：269 / 273

页数：5

共 50 条

[1] Encoder-Decoder Based Attractors for End-to-End Neural Diarization
Horiguchi, Shota
Fujita, Yusuke
Watanabe, Shinji
Xue, Yawen
Garcia, Paola
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 1493 - 1507
[2] Attention-Based Encoder-Decoder End-to-End Neural Diarization With Embedding Enhancer
Chen, Zhengyang
Han, Bing
Wang, Shuai
Qian, Yanmin
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 1636 - 1649
[3] Speech Separation for an Unknown Number of Speakers Using Transformers With Encoder-Decoder Attractors
Chetupalli, Srikanth Raj
Habets, Emanuel A. P.
[J]. INTERSPEECH 2022, 2022, : 5393 - 5397
[4] End-to-End Deep Background Subtraction based on Encoder-Decoder Network
Le, Duy H.
Pham, Tuan, V
[J]. PROCEEDINGS OF 2019 6TH NATIONAL FOUNDATION FOR SCIENCE AND TECHNOLOGY DEVELOPMENT (NAFOSTED) CONFERENCE ON INFORMATION AND COMPUTER SCIENCE (NICS), 2019, : 381 - 386
[5] End-to-End Neural Speaker Diarization With Non-Autoregressive Attractors
Rybicka, Magdalena
Villalba, Jesus
Thebaud, Thomas
Dehak, Najim
Kowalczyk, Konrad
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 3960 - 3973
[6] BW-EDA-EEND: STREAMING END-TO-END NEURAL SPEAKER DIARIZATION FOR A VARIABLE NUMBER OF SPEAKERS
Han, Eunjung
Lee, Chul
Stolcke, Andreas
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7193 - 7197
[7] END-TO-END DIARIZATION FOR VARIABLE NUMBER OF SPEAKERS WITH LOCAL-GLOBAL NETWORKS AND DISCRIMINATIVE SPEAKER EMBEDDINGS
Maiti, Soumi
Erdogan, Hakan
Wilson, Kevin
Wisdom, Scott
Watanabe, Shinji
Hershey, John R.
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7183 - 7187
[8] End-to-End Trained CNN Encoder-Decoder Networks for Image Steganography
Rehman, Atique ur
Rahim, Rafia
Nadeem, Shahroz
ul Hussain, Sibt
[J]. COMPUTER VISION - ECCV 2018 WORKSHOPS, PT IV, 2019, 11132 : 723 - 729
[9] TRANSCRIBE-TO-DIARIZE: NEURAL SPEAKER DIARIZATION FOR UNLIMITED NUMBER OF SPEAKERS USING END-TO-END SPEAKER-ATTRIBUTED ASR
Kanda, Naoyuki
Xiao, Xiong
Gaur, Yashesh
Wang, Xiaofei
Meng, Zhong
Chen, Zhuo
Yoshioka, Takuya
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8082 - 8086
[10] EEND-SS: JOINT END-TO-END NEURAL SPEAKER DIARIZATION AND SPEECH SEPARATION FOR FLEXIBLE NUMBER OF SPEAKERS
Maiti, Soumi
Ueda, Yushi
Watanabe, Shinji
Zhang, Chunlei
Yu, Meng
Zhang, Shi-Xiong
Xu, Yong
[J]. 2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 480 - 487

← 1 2 3 4 5 →