END-TO-END SPEAKER DIARIZATION CONDITIONED ON SPEECH ACTIVITY AND OVERLAP DETECTION

被引：13

作者：

Takashima, Yuki ^{[1
]}

Fujita, Yusuke ^{[1
]}

Watanabe, Shinji ^{[2
]}

Horiguchi, Shota ^{[1
]}

Garcia, Paola ^{[2
]}

Nagamatsu, Kenji ^{[1
]}

机构：

[1] Hitachi Ltd, Res & Dev Grp, Tokyo, Japan

[2] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD 21218 USA

来源：

2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT) | 2021年

关键词：

speaker diarization; multitask learning; chain rule; neural network; end-to-end;

D O I：

10.1109/SLT48900.2021.9383555

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we present a conditional multitask learning method for end-to-end neural speaker diarization (EEND). The EEND system has shown promising performance compared with traditional clustering-based methods, especially in the case of overlapping speech. In this paper, to further improve the performance of the EEND system, we propose a novel multitask learning framework that solves speaker diarization and a desired subtask while explicitly considering the task dependency. We optimize speaker diarization conditioned on speech activity and overlap detection that are subtasks of speaker diarization, based on the probabilistic chain rule. Experimental results show that our proposed method can leverage a subtask to effectively model speaker diarization, and outperforms conventional EEND systems in terms of diarization error rate.

引用

页码：849 / 856

页数：8

共 50 条

[31] END-TO-END MULTI-SPEAKER SPEECH RECOGNITION WITH TRANSFORMER
Chang, Xuankai
Zhang, Wangyou
Qian, Yanmin
Le Roux, Jonathan
Watanabe, Shinji
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6134 - 6138
[32] End-to-End Multilingual Multi-Speaker Speech Recognition
Seki, Hiroshi
Hori, Takaaki
Watanabe, Shinji
Le Roux, Jonathan
Hershey, John R.
INTERSPEECH 2019, 2019, : 3755 - 3759
[33] Speech Overlap Detection in a Two-Pass Speaker Diarization System
Huijbregts, Marijn
van Leeuwen, David
de Jong, Franciska
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1047 - +
[34] INCORPORATING END-TO-END FRAMEWORK INTO TARGET-SPEAKER VOICE ACTIVITY DETECTION
Wang, Weiqing
Li, Ming
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8362 - 8366
[35] Joint Discriminative Embedding Learning, Speech Activity and Overlap Detection for the DIHARD Speaker Diarization Challenge
Miasato Filho, Valter A.
Silva, Diego A.
Cuozzo, Luis Gustavo D.
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2818 - 2822
[36] End-to-End Speaker Diarization for an Unknown Number of Speakers with Encoder-Decoder Based Attractors
Horiguchi, Shota
Fujita, Yusuke
Watanabe, Shinji
Xue, Yawen
Nagamatsu, Kenji
INTERSPEECH 2020, 2020, : 269 - 273
[37] Towards End-to-End Synthetic Speech Detection
Hua, Guang
Teoh, Andrew Beng Jin
Zhang, Haijian
IEEE SIGNAL PROCESSING LETTERS, 2021, 28 (28) : 1265 - 1269
[38] Speaker Adaptation for Attention-Based End-to-End Speech Recognition
Meng, Zhong
Gaur, Yashesh
Li, Jinyu
Gong, Yifan
INTERSPEECH 2019, 2019, : 241 - 245
[39] End-to-end child-adult speech diarization in naturalistic conditions of preschool classrooms
Kothalkar, Prasanna V.
Irvin, Dwight
Buzhardt, Jay
Hansen, John H.
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2023, 153 (03):
[40] A Purely End-to-end System for Multi-speaker Speech Recognition
Seki, Hiroshi
Hori, Takaaki
Watanabe, Shinji
Le Roux, Jonathan
Hershey, John R.
PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, 2018, : 2620 - 2630

← 1 2 3 4 5 →