AUXILIARY LOSS OF TRANSFORMER WITH RESIDUAL CONNECTION FOR END-TO-END SPEAKER DIARIZATION

被引:7
|
作者
Yu, Yechan [1 ]
Park, Dongkeon [2 ]
Kim, Hong Kook [1 ,2 ]
机构
[1] Sch Elect Engn & Comp Sci, Gwangju 61005, South Korea
[2] Gwangju Inst Sci & Technol, AI Grad Sch, Gwangju 61005, South Korea
关键词
speaker diarization; end-to-end neural diarization; auxiliary loss; residual connection;
D O I
10.1109/ICASSP43922.2022.9746602
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
End-to-end neural diarization (EEND) with self-attention directly predicts speaker labels from inputs and enables the handling of overlapped speech. Although the EEND outperforms clustering-based speaker diarization (SD), it cannot be further improved by simply increasing the number of encoder blocks because the last encoder block is dominantly supervised compared with lower blocks. This paper proposes a new residual auxiliary EEND (RX-EEND) learning architecture for transformers to enforce the lower encoder blocks to learn more accurately. The auxiliary loss is applied to the output of each encoder block, including the last encoder block. The effect of auxiliary loss on the learning of the encoder blocks can be further increased by adding a residual connection between the encoder blocks of the EEND. Performance evaluation and ablation study reveal that the auxiliary loss in the proposed RX-EEND provides relative reductions in the diarization error rate (DER) by 50.3% and 21.0% on the simulated and CALLHOME (CH) datasets, respectively, compared with self-attentive EEND (SA-EEND). Furthermore, the residual connection used in RX-EEND further relatively reduces the DER by 8.1% for CH dataset.
引用
收藏
页码:8377 / 8381
页数:5
相关论文
共 50 条
  • [21] Angular Softmax Loss for End-to-end Speaker Verification
    Li, Yutian
    Gao, Feng
    Ou, Zhijian
    Sun, Jiasong
    [J]. 2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 190 - 194
  • [22] Self-Conditioning via Intermediate Predictions for End-to-End Neural Speaker Diarization
    Fujita, Yusuke
    Ogawa, Tetsuji
    Kobayashi, Tetsunori
    [J]. IEEE ACCESS, 2023, 11 : 140069 - 140076
  • [23] A study on end-to-end speaker diarization system using single-label classification
    Jung, Jaehee
    Kim, Wooil
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2023, 42 (06): : 536 - 543
  • [24] End-to-End Speaker Diarization for an Unknown Number of Speakers with Encoder-Decoder Based Attractors
    Horiguchi, Shota
    Fujita, Yusuke
    Watanabe, Shinji
    Xue, Yawen
    Nagamatsu, Kenji
    [J]. INTERSPEECH 2020, 2020, : 269 - 273
  • [25] BW-EDA-EEND: STREAMING END-TO-END NEURAL SPEAKER DIARIZATION FOR A VARIABLE NUMBER OF SPEAKERS
    Han, Eunjung
    Lee, Chul
    Stolcke, Andreas
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7193 - 7197
  • [26] TRANSCRIBE-TO-DIARIZE: NEURAL SPEAKER DIARIZATION FOR UNLIMITED NUMBER OF SPEAKERS USING END-TO-END SPEAKER-ATTRIBUTED ASR
    Kanda, Naoyuki
    Xiao, Xiong
    Gaur, Yashesh
    Wang, Xiaofei
    Meng, Zhong
    Chen, Zhuo
    Yoshioka, Takuya
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8082 - 8086
  • [27] END-TO-END DIARIZATION FOR VARIABLE NUMBER OF SPEAKERS WITH LOCAL-GLOBAL NETWORKS AND DISCRIMINATIVE SPEAKER EMBEDDINGS
    Maiti, Soumi
    Erdogan, Hakan
    Wilson, Kevin
    Wisdom, Scott
    Watanabe, Shinji
    Hershey, John R.
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7183 - 7187
  • [28] ASR-AWARE END-TO-END NEURAL DIARIZATION
    Khare, Aparna
    Han, Eunjung
    Yang, Yuguang
    Stolcke, Andreas
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8092 - 8096
  • [29] End-to-End Chinese Speaker Identification
    Yu, Dian
    Zhou, Ben
    Yu, Dong
    [J]. NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 2274 - 2285
  • [30] End-to-End Active Speaker Detection
    Alcazar, Juan Leon
    Cordes, Moritz
    Zhao, Chen
    Ghanem, Bernard
    [J]. COMPUTER VISION, ECCV 2022, PT XXXVII, 2022, 13697 : 126 - 143