AUXILIARY LOSS OF TRANSFORMER WITH RESIDUAL CONNECTION FOR END-TO-END SPEAKER DIARIZATION

被引:7
|
作者
Yu, Yechan [1 ]
Park, Dongkeon [2 ]
Kim, Hong Kook [1 ,2 ]
机构
[1] Sch Elect Engn & Comp Sci, Gwangju 61005, South Korea
[2] Gwangju Inst Sci & Technol, AI Grad Sch, Gwangju 61005, South Korea
关键词
speaker diarization; end-to-end neural diarization; auxiliary loss; residual connection;
D O I
10.1109/ICASSP43922.2022.9746602
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
End-to-end neural diarization (EEND) with self-attention directly predicts speaker labels from inputs and enables the handling of overlapped speech. Although the EEND outperforms clustering-based speaker diarization (SD), it cannot be further improved by simply increasing the number of encoder blocks because the last encoder block is dominantly supervised compared with lower blocks. This paper proposes a new residual auxiliary EEND (RX-EEND) learning architecture for transformers to enforce the lower encoder blocks to learn more accurately. The auxiliary loss is applied to the output of each encoder block, including the last encoder block. The effect of auxiliary loss on the learning of the encoder blocks can be further increased by adding a residual connection between the encoder blocks of the EEND. Performance evaluation and ablation study reveal that the auxiliary loss in the proposed RX-EEND provides relative reductions in the diarization error rate (DER) by 50.3% and 21.0% on the simulated and CALLHOME (CH) datasets, respectively, compared with self-attentive EEND (SA-EEND). Furthermore, the residual connection used in RX-EEND further relatively reduces the DER by 8.1% for CH dataset.
引用
收藏
页码:8377 / 8381
页数:5
相关论文
共 50 条
  • [1] END-TO-END SPEAKER DIARIZATION AS POST-PROCESSING
    Horiguchi, Shota
    Garcia, Paola
    Fujita, Yusuke
    Watanabe, Shinji
    Nagamatsu, Kenji
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7188 - 7192
  • [2] TOWARDS END-TO-END SPEAKER DIARIZATION WITH GENERALIZED NEURAL SPEAKER CLUSTERING
    Zhang, Chunlei
    Shi, Jiatong
    Weng, Chao
    Yu, Meng
    Yu, Dong
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8372 - 8376
  • [3] End-to-end Neural Diarization: From Transformer to Conformer
    Liu, Yi Chieh
    Han, Eunjung
    Lee, Chul
    Stolcke, Andreas
    [J]. INTERSPEECH 2021, 2021, : 3081 - 3085
  • [4] END-TO-END NEURAL SPEAKER DIARIZATION WITH SELF-ATTENTION
    Fujita, Yusuke
    Kanda, Naoyuki
    Horiguchi, Shota
    Xue, Yawen
    Nagamatsu, Kenji
    Watanabe, Shinji
    [J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 296 - 303
  • [5] End-to-End Audio-Visual Neural Speaker Diarization
    He, Mao-kui
    Du, Jun
    Lee, Chin-Hui
    [J]. INTERSPEECH 2022, 2022, : 1461 - 1465
  • [6] Robust End-to-end Speaker Diarization with Generic Neural Clustering
    Yang, Chenyu
    Wang, Yu
    [J]. INTERSPEECH 2022, 2022, : 1471 - 1475
  • [7] Robust End-to-end Speaker Diarization with Conformer and Additive Margin Penalty
    Leung, Tsun-Yat
    Samarakoon, Lahiru
    [J]. INTERSPEECH 2021, 2021, : 3575 - 3579
  • [8] End-to-End Neural Speaker Diarization With Non-Autoregressive Attractors
    Rybicka, Magdalena
    Villalba, Jesus
    Thebaud, Thomas
    Dehak, Najim
    Kowalczyk, Konrad
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 3960 - 3973
  • [9] End-To-End Neural Speaker Diarization Through Step-Function
    Latypov, Rustam
    Stolov, Evgeni
    [J]. 2021 IEEE 15TH INTERNATIONAL CONFERENCE ON APPLICATION OF INFORMATION AND COMMUNICATION TECHNOLOGIES (AICT2021), 2021,
  • [10] End-to-End Neural Speaker Diarization with Permutation-Free Objectives
    Fujita, Yusuke
    Kanda, Naoyuki
    Horiguchi, Shota
    Nagamatsu, Kenji
    Watanabe, Shinji
    [J]. INTERSPEECH 2019, 2019, : 4300 - 4304