END-TO-END DIARIZATION FOR VARIABLE NUMBER OF SPEAKERS WITH LOCAL-GLOBAL NETWORKS AND DISCRIMINATIVE SPEAKER EMBEDDINGS

被引：12

作者：

Maiti, Soumi ^{[1
,4
]}

Erdogan, Hakan ^{[2
]}

Wilson, Kevin ^{[2
]}

Wisdom, Scott ^{[2
]}

Watanabe, Shinji ^{[3
]}

Hershey, John R. ^{[2
]}

机构：

[1] CUNY, Grad Ctr, New York, NY 10010 USA

[2] Google Res, Mountain View, CA USA

[3] Johns Hopkins Univ, Baltimore, MD 21218 USA

[4] Google, Mountain View, CA 94043 USA

来源：

2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) | 2021年

关键词：

Diarization; attention; deep learning;

D O I：

10.1109/ICASSP39728.2021.9414841

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

We present an end-to-end deep network model that performs meeting diarization from single-channel audio recordings. End-to-end diarization models have the advantage of handling speaker overlap and enabling straightforward handling of discriminative training, unlike traditional clustering-based diarization methods. The proposed system is designed to handle meetings with unknown numbers of speakers, using variable-number permutation-invariant cross-entropy based loss functions. We introduce several components that appear to help with diarization performance, including a local convolutional network followed by a global self-attention module, multi-task transfer learning using a speaker identification component, and a sequential approach where the model is refined with a second stage. These are trained and validated on simulated meeting data based on LibriSpeech and LibriTTS datasets; final evaluations are done using LibriCSS, which consists of simulated meetings recorded using real acoustics via loudspeaker playback. The proposed model performs better than previously proposed end-to-end diarization models on these data.

引用

页码：7183 / 7187

页数：5

共 50 条

[1] BW-EDA-EEND: STREAMING END-TO-END NEURAL SPEAKER DIARIZATION FOR A VARIABLE NUMBER OF SPEAKERS
Han, Eunjung
Lee, Chul
Stolcke, Andreas
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7193 - 7197
[2] End-to-End Speaker Diarization for an Unknown Number of Speakers with Encoder-Decoder Based Attractors
Horiguchi, Shota
Fujita, Yusuke
Watanabe, Shinji
Xue, Yawen
Nagamatsu, Kenji
[J]. INTERSPEECH 2020, 2020, : 269 - 273
[3] TRANSCRIBE-TO-DIARIZE: NEURAL SPEAKER DIARIZATION FOR UNLIMITED NUMBER OF SPEAKERS USING END-TO-END SPEAKER-ATTRIBUTED ASR
Kanda, Naoyuki
Xiao, Xiong
Gaur, Yashesh
Wang, Xiaofei
Meng, Zhong
Chen, Zhuo
Yoshioka, Takuya
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8082 - 8086
[4] EEND-SS: JOINT END-TO-END NEURAL SPEAKER DIARIZATION AND SPEECH SEPARATION FOR FLEXIBLE NUMBER OF SPEAKERS
Maiti, Soumi
Ueda, Yushi
Watanabe, Shinji
Zhang, Chunlei
Yu, Meng
Zhang, Shi-Xiong
Xu, Yong
[J]. 2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 480 - 487
[5] END-TO-END SPEAKER DIARIZATION AS POST-PROCESSING
Horiguchi, Shota
Garcia, Paola
Fujita, Yusuke
Watanabe, Shinji
Nagamatsu, Kenji
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7188 - 7192
[6] TOWARDS END-TO-END SPEAKER DIARIZATION WITH GENERALIZED NEURAL SPEAKER CLUSTERING
Zhang, Chunlei
Shi, Jiatong
Weng, Chao
Yu, Meng
Yu, Dong
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8372 - 8376
[7] END-TO-END NEURAL SPEAKER DIARIZATION WITH SELF-ATTENTION
Fujita, Yusuke
Kanda, Naoyuki
Horiguchi, Shota
Xue, Yawen
Nagamatsu, Kenji
Watanabe, Shinji
[J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 296 - 303
[8] End-to-End Audio-Visual Neural Speaker Diarization
He, Mao-kui
Du, Jun
Lee, Chin-Hui
[J]. INTERSPEECH 2022, 2022, : 1461 - 1465
[9] Robust End-to-end Speaker Diarization with Generic Neural Clustering
Yang, Chenyu
Wang, Yu
[J]. INTERSPEECH 2022, 2022, : 1471 - 1475
[10] Robust End-to-end Speaker Diarization with Conformer and Additive Margin Penalty
Leung, Tsun-Yat
Samarakoon, Lahiru
[J]. INTERSPEECH 2021, 2021, : 3575 - 3579

← 1 2 3 4 5 →