END-TO-END SPEAKER DIARIZATION CONDITIONED ON SPEECH ACTIVITY AND OVERLAP DETECTION

被引:13
|
作者
Takashima, Yuki [1 ]
Fujita, Yusuke [1 ]
Watanabe, Shinji [2 ]
Horiguchi, Shota [1 ]
Garcia, Paola [2 ]
Nagamatsu, Kenji [1 ]
机构
[1] Hitachi Ltd, Res & Dev Grp, Tokyo, Japan
[2] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD 21218 USA
关键词
speaker diarization; multitask learning; chain rule; neural network; end-to-end;
D O I
10.1109/SLT48900.2021.9383555
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we present a conditional multitask learning method for end-to-end neural speaker diarization (EEND). The EEND system has shown promising performance compared with traditional clustering-based methods, especially in the case of overlapping speech. In this paper, to further improve the performance of the EEND system, we propose a novel multitask learning framework that solves speaker diarization and a desired subtask while explicitly considering the task dependency. We optimize speaker diarization conditioned on speech activity and overlap detection that are subtasks of speaker diarization, based on the probabilistic chain rule. Experimental results show that our proposed method can leverage a subtask to effectively model speaker diarization, and outperforms conventional EEND systems in terms of diarization error rate.
引用
收藏
页码:849 / 856
页数:8
相关论文
共 50 条
  • [21] EEND-SS: JOINT END-TO-END NEURAL SPEAKER DIARIZATION AND SPEECH SEPARATION FOR FLEXIBLE NUMBER OF SPEAKERS
    Maiti, Soumi
    Ueda, Yushi
    Watanabe, Shinji
    Zhang, Chunlei
    Yu, Meng
    Zhang, Shi-Xiong
    Xu, Yong
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 480 - 487
  • [22] CONTINUAL SELF-SUPERVISED DOMAIN ADAPTATION FOR END-TO-END SPEAKER DIARIZATION
    Coria, Juan M.
    Bredin, Herve
    Ghannay, Sahar
    Rosset, Sophie
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 626 - 632
  • [23] End-to-end speaker segmentation for overlap-aware resegmentation
    Bredin, Herve
    Laurent, Antoine
    INTERSPEECH 2021, 2021, : 3111 - 3115
  • [24] Wavesplit: End-to-End Speech Separation by Speaker Clustering
    Zeghidour, Neil
    Grangier, David
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 2840 - 2849
  • [25] SPEAKER ADAPTATION FOR MULTICHANNEL END-TO-END SPEECH RECOGNITION
    Ochiai, Tsubasa
    Watanabe, Shinji
    Katagiri, Shigeru
    Hori, Takaaki
    Hershey, John
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6707 - 6711
  • [26] Speaker voice normalization for end-to-end speech translation
    Xue, Zhengshan
    Shi, Tingxun
    Zhang, Xiaolei
    Xiong, Deyi
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 248
  • [27] END-TO-END MULTI-SPEAKER SPEECH RECOGNITION
    Settle, Shane
    Le Roux, Jonathan
    Hori, Takaaki
    Watanabe, Shinji
    Hershey, John R.
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4819 - 4823
  • [28] End-to-End Language Diarization for Bilingual Code-Switching Speech
    Liu, Hexin
    Perera, Leibny Paola Garcia
    Zhang, Xinyi
    Dauwels, Justin
    Khong, Andy W. H.
    Khudanpur, Sanjeev
    Styles, Suzy J.
    INTERSPEECH 2021, 2021, : 1489 - 1493
  • [29] Self-Conditioning via Intermediate Predictions for End-to-End Neural Speaker Diarization
    Fujita, Yusuke
    Ogawa, Tetsuji
    Kobayashi, Tetsunori
    IEEE ACCESS, 2023, 11 : 140069 - 140076
  • [30] A study on end-to-end speaker diarization system using single-label classification
    Jung, Jaehee
    Kim, Wooil
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2023, 42 (06): : 536 - 543