END-TO-END SPEAKER DIARIZATION CONDITIONED ON SPEECH ACTIVITY AND OVERLAP DETECTION

被引：13

作者：

Takashima, Yuki ^{[1
]}

Fujita, Yusuke ^{[1
]}

Watanabe, Shinji ^{[2
]}

Horiguchi, Shota ^{[1
]}

Garcia, Paola ^{[2
]}

Nagamatsu, Kenji ^{[1
]}

机构：

[1] Hitachi Ltd, Res & Dev Grp, Tokyo, Japan

[2] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD 21218 USA

来源：

2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT) | 2021年

关键词：

speaker diarization; multitask learning; chain rule; neural network; end-to-end;

D O I：

10.1109/SLT48900.2021.9383555

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we present a conditional multitask learning method for end-to-end neural speaker diarization (EEND). The EEND system has shown promising performance compared with traditional clustering-based methods, especially in the case of overlapping speech. In this paper, to further improve the performance of the EEND system, we propose a novel multitask learning framework that solves speaker diarization and a desired subtask while explicitly considering the task dependency. We optimize speaker diarization conditioned on speech activity and overlap detection that are subtasks of speaker diarization, based on the probabilistic chain rule. Experimental results show that our proposed method can leverage a subtask to effectively model speaker diarization, and outperforms conventional EEND systems in terms of diarization error rate.

引用

页码：849 / 856

页数：8

共 50 条

[1] DIVE: END-TO-END SPEECH DIARIZATION VIA ITERATIVE SPEAKER EMBEDDING
Zeghidour, Neil
Teboul, Olivier
Grangier, David
2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 702 - 709
[2] OVERLAP-AWARE DIARIZATION: RESEGMENTATION USING NEURAL END-TO-END OVERLAPPED SPEECH DETECTION
Bullock, Latane
Bredin, Herve
Garcia-Perera, Leibny Paola
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7114 - 7118
[3] END-TO-END SPEAKER DIARIZATION AS POST-PROCESSING
Horiguchi, Shota
Garcia, Paola
Fujita, Yusuke
Watanabe, Shinji
Nagamatsu, Kenji
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7188 - 7192
[4] TOWARDS END-TO-END SPEAKER DIARIZATION WITH GENERALIZED NEURAL SPEAKER CLUSTERING
Zhang, Chunlei
Shi, Jiatong
Weng, Chao
Yu, Meng
Yu, Dong
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8372 - 8376
[5] END-TO-END NEURAL SPEAKER DIARIZATION WITH SELF-ATTENTION
Fujita, Yusuke
Kanda, Naoyuki
Horiguchi, Shota
Xue, Yawen
Nagamatsu, Kenji
Watanabe, Shinji
2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 296 - 303
[6] End-to-End Audio-Visual Neural Speaker Diarization
He, Mao-kui
Du, Jun
Lee, Chin-Hui
INTERSPEECH 2022, 2022, : 1461 - 1465
[7] Robust End-to-end Speaker Diarization with Generic Neural Clustering
Yang, Chenyu
Wang, Yu
INTERSPEECH 2022, 2022, : 1471 - 1475
[8] Streaming End-to-End Target-Speaker Automatic Speech Recognition and Activity Detection
Moriya, Takafumi
Sato, Hiroshi
Ochiai, Tsubasa
Delcroix, Marc
Shinozaki, Takahiro
IEEE ACCESS, 2023, 11 : 13906 - 13917
[9] OVERLAP-AWARE LOW-LATENCY ONLINE SPEAKER DIARIZATION BASED ON END-TO-END LOCAL SEGMENTATION
Coria, Juan M.
Bredin, Herve
Ghannay, Sahar
Rosset, Sophie
2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 1139 - 1146
[10] End-to-End Neural Speaker Diarization With Non-Autoregressive Attractors
Rybicka, Magdalena
Villalba, Jesus
Thebaud, Thomas
Dehak, Najim
Kowalczyk, Konrad
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 3960 - 3973

← 1 2 3 4 5 →