END-TO-END SPEAKER DIARIZATION CONDITIONED ON SPEECH ACTIVITY AND OVERLAP DETECTION

被引:13
|
作者
Takashima, Yuki [1 ]
Fujita, Yusuke [1 ]
Watanabe, Shinji [2 ]
Horiguchi, Shota [1 ]
Garcia, Paola [2 ]
Nagamatsu, Kenji [1 ]
机构
[1] Hitachi Ltd, Res & Dev Grp, Tokyo, Japan
[2] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD 21218 USA
关键词
speaker diarization; multitask learning; chain rule; neural network; end-to-end;
D O I
10.1109/SLT48900.2021.9383555
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we present a conditional multitask learning method for end-to-end neural speaker diarization (EEND). The EEND system has shown promising performance compared with traditional clustering-based methods, especially in the case of overlapping speech. In this paper, to further improve the performance of the EEND system, we propose a novel multitask learning framework that solves speaker diarization and a desired subtask while explicitly considering the task dependency. We optimize speaker diarization conditioned on speech activity and overlap detection that are subtasks of speaker diarization, based on the probabilistic chain rule. Experimental results show that our proposed method can leverage a subtask to effectively model speaker diarization, and outperforms conventional EEND systems in terms of diarization error rate.
引用
收藏
页码:849 / 856
页数:8
相关论文
共 50 条
  • [1] DIVE: END-TO-END SPEECH DIARIZATION VIA ITERATIVE SPEAKER EMBEDDING
    Zeghidour, Neil
    Teboul, Olivier
    Grangier, David
    [J]. 2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 702 - 709
  • [2] OVERLAP-AWARE DIARIZATION: RESEGMENTATION USING NEURAL END-TO-END OVERLAPPED SPEECH DETECTION
    Bullock, Latane
    Bredin, Herve
    Garcia-Perera, Leibny Paola
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7114 - 7118
  • [3] END-TO-END SPEAKER DIARIZATION AS POST-PROCESSING
    Horiguchi, Shota
    Garcia, Paola
    Fujita, Yusuke
    Watanabe, Shinji
    Nagamatsu, Kenji
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7188 - 7192
  • [4] TOWARDS END-TO-END SPEAKER DIARIZATION WITH GENERALIZED NEURAL SPEAKER CLUSTERING
    Zhang, Chunlei
    Shi, Jiatong
    Weng, Chao
    Yu, Meng
    Yu, Dong
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8372 - 8376
  • [5] END-TO-END NEURAL SPEAKER DIARIZATION WITH SELF-ATTENTION
    Fujita, Yusuke
    Kanda, Naoyuki
    Horiguchi, Shota
    Xue, Yawen
    Nagamatsu, Kenji
    Watanabe, Shinji
    [J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 296 - 303
  • [6] End-to-End Audio-Visual Neural Speaker Diarization
    He, Mao-kui
    Du, Jun
    Lee, Chin-Hui
    [J]. INTERSPEECH 2022, 2022, : 1461 - 1465
  • [7] Robust End-to-end Speaker Diarization with Generic Neural Clustering
    Yang, Chenyu
    Wang, Yu
    [J]. INTERSPEECH 2022, 2022, : 1471 - 1475
  • [8] Streaming End-to-End Target-Speaker Automatic Speech Recognition and Activity Detection
    Moriya, Takafumi
    Sato, Hiroshi
    Ochiai, Tsubasa
    Delcroix, Marc
    Shinozaki, Takahiro
    [J]. IEEE ACCESS, 2023, 11 : 13906 - 13917
  • [9] OVERLAP-AWARE LOW-LATENCY ONLINE SPEAKER DIARIZATION BASED ON END-TO-END LOCAL SEGMENTATION
    Coria, Juan M.
    Bredin, Herve
    Ghannay, Sahar
    Rosset, Sophie
    [J]. 2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 1139 - 1146
  • [10] End-to-End Neural Speaker Diarization With Non-Autoregressive Attractors
    Rybicka, Magdalena
    Villalba, Jesus
    Thebaud, Thomas
    Dehak, Najim
    Kowalczyk, Konrad
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 3960 - 3973