END-TO-END SPEAKER DIARIZATION CONDITIONED ON SPEECH ACTIVITY AND OVERLAP DETECTION

被引：13

作者：

Takashima, Yuki ^{[1
]}

Fujita, Yusuke ^{[1
]}

Watanabe, Shinji ^{[2
]}

Horiguchi, Shota ^{[1
]}

Garcia, Paola ^{[2
]}

Nagamatsu, Kenji ^{[1
]}

机构：

[1] Hitachi Ltd, Res & Dev Grp, Tokyo, Japan

[2] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD 21218 USA

来源：

2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT) | 2021年

关键词：

speaker diarization; multitask learning; chain rule; neural network; end-to-end;

D O I：

10.1109/SLT48900.2021.9383555

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we present a conditional multitask learning method for end-to-end neural speaker diarization (EEND). The EEND system has shown promising performance compared with traditional clustering-based methods, especially in the case of overlapping speech. In this paper, to further improve the performance of the EEND system, we propose a novel multitask learning framework that solves speaker diarization and a desired subtask while explicitly considering the task dependency. We optimize speaker diarization conditioned on speech activity and overlap detection that are subtasks of speaker diarization, based on the probabilistic chain rule. Experimental results show that our proposed method can leverage a subtask to effectively model speaker diarization, and outperforms conventional EEND systems in terms of diarization error rate.

引用

页码：849 / 856

页数：8

共 50 条

[21] EEND-SS: JOINT END-TO-END NEURAL SPEAKER DIARIZATION AND SPEECH SEPARATION FOR FLEXIBLE NUMBER OF SPEAKERS
Maiti, Soumi
Ueda, Yushi
Watanabe, Shinji
Zhang, Chunlei
Yu, Meng
Zhang, Shi-Xiong
Xu, Yong
2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 480 - 487
[22] CONTINUAL SELF-SUPERVISED DOMAIN ADAPTATION FOR END-TO-END SPEAKER DIARIZATION
Coria, Juan M.
Bredin, Herve
Ghannay, Sahar
Rosset, Sophie
2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 626 - 632
[23] End-to-end speaker segmentation for overlap-aware resegmentation
Bredin, Herve
Laurent, Antoine
INTERSPEECH 2021, 2021, : 3111 - 3115
[24] Wavesplit: End-to-End Speech Separation by Speaker Clustering
Zeghidour, Neil
Grangier, David
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 2840 - 2849
[25] SPEAKER ADAPTATION FOR MULTICHANNEL END-TO-END SPEECH RECOGNITION
Ochiai, Tsubasa
Watanabe, Shinji
Katagiri, Shigeru
Hori, Takaaki
Hershey, John
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6707 - 6711
[26] Speaker voice normalization for end-to-end speech translation
Xue, Zhengshan
Shi, Tingxun
Zhang, Xiaolei
Xiong, Deyi
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 248
[27] END-TO-END MULTI-SPEAKER SPEECH RECOGNITION
Settle, Shane
Le Roux, Jonathan
Hori, Takaaki
Watanabe, Shinji
Hershey, John R.
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4819 - 4823
[28] End-to-End Language Diarization for Bilingual Code-Switching Speech
Liu, Hexin
Perera, Leibny Paola Garcia
Zhang, Xinyi
Dauwels, Justin
Khong, Andy W. H.
Khudanpur, Sanjeev
Styles, Suzy J.
INTERSPEECH 2021, 2021, : 1489 - 1493
[29] Self-Conditioning via Intermediate Predictions for End-to-End Neural Speaker Diarization
Fujita, Yusuke
Ogawa, Tetsuji
Kobayashi, Tetsunori
IEEE ACCESS, 2023, 11 : 140069 - 140076
[30] A study on end-to-end speaker diarization system using single-label classification
Jung, Jaehee
Kim, Wooil
JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2023, 42 (06): : 536 - 543

← 1 2 3 4 5 →