END-TO-END DIARIZATION FOR VARIABLE NUMBER OF SPEAKERS WITH LOCAL-GLOBAL NETWORKS AND DISCRIMINATIVE SPEAKER EMBEDDINGS

被引：12

作者：

Maiti, Soumi ^{[1
,4
]}

Erdogan, Hakan ^{[2
]}

Wilson, Kevin ^{[2
]}

Wisdom, Scott ^{[2
]}

Watanabe, Shinji ^{[3
]}

Hershey, John R. ^{[2
]}

机构：

[1] CUNY, Grad Ctr, New York, NY 10010 USA

[2] Google Res, Mountain View, CA USA

[3] Johns Hopkins Univ, Baltimore, MD 21218 USA

[4] Google, Mountain View, CA 94043 USA

来源：

2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) | 2021年

关键词：

Diarization; attention; deep learning;

D O I：

10.1109/ICASSP39728.2021.9414841

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

We present an end-to-end deep network model that performs meeting diarization from single-channel audio recordings. End-to-end diarization models have the advantage of handling speaker overlap and enabling straightforward handling of discriminative training, unlike traditional clustering-based diarization methods. The proposed system is designed to handle meetings with unknown numbers of speakers, using variable-number permutation-invariant cross-entropy based loss functions. We introduce several components that appear to help with diarization performance, including a local convolutional network followed by a global self-attention module, multi-task transfer learning using a speaker identification component, and a sequential approach where the model is refined with a second stage. These are trained and validated on simulated meeting data based on LibriSpeech and LibriTTS datasets; final evaluations are done using LibriCSS, which consists of simulated meetings recorded using real acoustics via loudspeaker playback. The proposed model performs better than previously proposed end-to-end diarization models on these data.

引用

页码：7183 / 7187

页数：5

共 50 条

[31] GENERATIVE ADVERSARIAL SPEAKER EMBEDDING NETWORKS FOR DOMAIN ROBUST END-TO-END SPEAKER VERIFICATION
Bhattacharya, Gautam
Monteiro, Joao
Alam, Jahangir
Kenny, Patrick
[J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6226 - 6230
[32] Tied Hidden Factors in Neural Networks for End-to-End Speaker Recognition
Miguel, Antonio
Llombart, Jorge
Ortega, Alfonso
Lleida, Eduardo
[J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2819 - 2823
[33] Uncertainty-Guided End-to-End Audio-Visual Speaker Diarization for Far-Field Recordings
Yang, Chenyu
Chen, Mengxi
Wang, Yanfeng
Wang, Yu
[J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 4031 - 4041
[34] SPEAKER-AWARE TRAINING OF ATTENTION-BASED END-TO-END SPEECH RECOGNITION USING NEURAL SPEAKER EMBEDDINGS
Rouhe, Aku
Kaseva, Tuomas
Kurimo, Mikko
[J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7064 - 7068
[35] End-to-End Neural Speaker Diarization with an Iterative Refinement of Non-Autoregressive Attention-based Attractors
Rybicka, Magdalena
Villalba, Jesus
Dehak, Najim
Kowalczyk, Konrad
[J]. INTERSPEECH 2022, 2022, : 5090 - 5094
[36] END-TO-END PERFORMANCE MODELING OF LOCAL AREA NETWORKS
MITCHELL, LC
LIDE, DA
[J]. IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 1986, 4 (06) : 975 - 985
[37] Achieving Global End-to-End Maxmin in Multiliop Wireless Networks
Zhang, Liang
Chen, Shigang
Jian, Ying
[J]. 28TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS, VOLS 1 AND 2, PROCEEDINGS, 2008, : 225 - 232
[38] CASA-Net: Cross-attention and Self-attention for End-to-End Audio-visual Speaker Diarization
Zhou, Haodong
Li, Tao
Wang, Jie
Li, Lin
Hong, Qingyang
[J]. 2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 102 - 106
[39] Investigating Raw Wave Deep Neural Networks for End-to-End Speaker Spoofing Detection
Dinkel, Heinrich
Qian, Yanmin
Yu, Kai
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (11) : 2002 - 2014
[40] Achieving End-to-End Connectivity in Global Multi-Domain Networks
Municio, Esteban
Cevik, Mert
Ruth, Paul
Marquez-Barja, Johann M.
[J]. IEEE CONFERENCE ON COMPUTER COMMUNICATIONS WORKSHOPS (IEEE INFOCOM WKSHPS 2021), 2021,

← 1 2 3 4 5 →