END-TO-END DIARIZATION FOR VARIABLE NUMBER OF SPEAKERS WITH LOCAL-GLOBAL NETWORKS AND DISCRIMINATIVE SPEAKER EMBEDDINGS

被引:12
|
作者
Maiti, Soumi [1 ,4 ]
Erdogan, Hakan [2 ]
Wilson, Kevin [2 ]
Wisdom, Scott [2 ]
Watanabe, Shinji [3 ]
Hershey, John R. [2 ]
机构
[1] CUNY, Grad Ctr, New York, NY 10010 USA
[2] Google Res, Mountain View, CA USA
[3] Johns Hopkins Univ, Baltimore, MD 21218 USA
[4] Google, Mountain View, CA 94043 USA
关键词
Diarization; attention; deep learning;
D O I
10.1109/ICASSP39728.2021.9414841
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We present an end-to-end deep network model that performs meeting diarization from single-channel audio recordings. End-to-end diarization models have the advantage of handling speaker overlap and enabling straightforward handling of discriminative training, unlike traditional clustering-based diarization methods. The proposed system is designed to handle meetings with unknown numbers of speakers, using variable-number permutation-invariant cross-entropy based loss functions. We introduce several components that appear to help with diarization performance, including a local convolutional network followed by a global self-attention module, multi-task transfer learning using a speaker identification component, and a sequential approach where the model is refined with a second stage. These are trained and validated on simulated meeting data based on LibriSpeech and LibriTTS datasets; final evaluations are done using LibriCSS, which consists of simulated meetings recorded using real acoustics via loudspeaker playback. The proposed model performs better than previously proposed end-to-end diarization models on these data.
引用
收藏
页码:7183 / 7187
页数:5
相关论文
共 50 条
  • [31] GENERATIVE ADVERSARIAL SPEAKER EMBEDDING NETWORKS FOR DOMAIN ROBUST END-TO-END SPEAKER VERIFICATION
    Bhattacharya, Gautam
    Monteiro, Joao
    Alam, Jahangir
    Kenny, Patrick
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6226 - 6230
  • [32] Tied Hidden Factors in Neural Networks for End-to-End Speaker Recognition
    Miguel, Antonio
    Llombart, Jorge
    Ortega, Alfonso
    Lleida, Eduardo
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2819 - 2823
  • [33] Uncertainty-Guided End-to-End Audio-Visual Speaker Diarization for Far-Field Recordings
    Yang, Chenyu
    Chen, Mengxi
    Wang, Yanfeng
    Wang, Yu
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 4031 - 4041
  • [34] SPEAKER-AWARE TRAINING OF ATTENTION-BASED END-TO-END SPEECH RECOGNITION USING NEURAL SPEAKER EMBEDDINGS
    Rouhe, Aku
    Kaseva, Tuomas
    Kurimo, Mikko
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7064 - 7068
  • [35] End-to-End Neural Speaker Diarization with an Iterative Refinement of Non-Autoregressive Attention-based Attractors
    Rybicka, Magdalena
    Villalba, Jesus
    Dehak, Najim
    Kowalczyk, Konrad
    [J]. INTERSPEECH 2022, 2022, : 5090 - 5094
  • [36] END-TO-END PERFORMANCE MODELING OF LOCAL AREA NETWORKS
    MITCHELL, LC
    LIDE, DA
    [J]. IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 1986, 4 (06) : 975 - 985
  • [37] Achieving Global End-to-End Maxmin in Multiliop Wireless Networks
    Zhang, Liang
    Chen, Shigang
    Jian, Ying
    [J]. 28TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS, VOLS 1 AND 2, PROCEEDINGS, 2008, : 225 - 232
  • [38] CASA-Net: Cross-attention and Self-attention for End-to-End Audio-visual Speaker Diarization
    Zhou, Haodong
    Li, Tao
    Wang, Jie
    Li, Lin
    Hong, Qingyang
    [J]. 2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 102 - 106
  • [39] Investigating Raw Wave Deep Neural Networks for End-to-End Speaker Spoofing Detection
    Dinkel, Heinrich
    Qian, Yanmin
    Yu, Kai
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (11) : 2002 - 2014
  • [40] Achieving End-to-End Connectivity in Global Multi-Domain Networks
    Municio, Esteban
    Cevik, Mert
    Ruth, Paul
    Marquez-Barja, Johann M.
    [J]. IEEE CONFERENCE ON COMPUTER COMMUNICATIONS WORKSHOPS (IEEE INFOCOM WKSHPS 2021), 2021,