AUXILIARY LOSS OF TRANSFORMER WITH RESIDUAL CONNECTION FOR END-TO-END SPEAKER DIARIZATION

被引：7

作者：

Yu, Yechan ^{[1
]}

Park, Dongkeon ^{[2
]}

Kim, Hong Kook ^{[1
,2
]}

机构：

[1] Sch Elect Engn & Comp Sci, Gwangju 61005, South Korea

[2] Gwangju Inst Sci & Technol, AI Grad Sch, Gwangju 61005, South Korea

来源：

2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2022年

关键词：

speaker diarization; end-to-end neural diarization; auxiliary loss; residual connection;

D O I：

10.1109/ICASSP43922.2022.9746602

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

End-to-end neural diarization (EEND) with self-attention directly predicts speaker labels from inputs and enables the handling of overlapped speech. Although the EEND outperforms clustering-based speaker diarization (SD), it cannot be further improved by simply increasing the number of encoder blocks because the last encoder block is dominantly supervised compared with lower blocks. This paper proposes a new residual auxiliary EEND (RX-EEND) learning architecture for transformers to enforce the lower encoder blocks to learn more accurately. The auxiliary loss is applied to the output of each encoder block, including the last encoder block. The effect of auxiliary loss on the learning of the encoder blocks can be further increased by adding a residual connection between the encoder blocks of the EEND. Performance evaluation and ablation study reveal that the auxiliary loss in the proposed RX-EEND provides relative reductions in the diarization error rate (DER) by 50.3% and 21.0% on the simulated and CALLHOME (CH) datasets, respectively, compared with self-attentive EEND (SA-EEND). Furthermore, the residual connection used in RX-EEND further relatively reduces the DER by 8.1% for CH dataset.

引用

页码：8377 / 8381

页数：5

共 50 条

[21] Angular Softmax Loss for End-to-end Speaker Verification
Li, Yutian
Gao, Feng
Ou, Zhijian
Sun, Jiasong
[J]. 2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 190 - 194
[22] Self-Conditioning via Intermediate Predictions for End-to-End Neural Speaker Diarization
Fujita, Yusuke
Ogawa, Tetsuji
Kobayashi, Tetsunori
[J]. IEEE ACCESS, 2023, 11 : 140069 - 140076
[23] A study on end-to-end speaker diarization system using single-label classification
Jung, Jaehee
Kim, Wooil
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2023, 42 (06): : 536 - 543
[24] End-to-End Speaker Diarization for an Unknown Number of Speakers with Encoder-Decoder Based Attractors
Horiguchi, Shota
Fujita, Yusuke
Watanabe, Shinji
Xue, Yawen
Nagamatsu, Kenji
[J]. INTERSPEECH 2020, 2020, : 269 - 273
[25] BW-EDA-EEND: STREAMING END-TO-END NEURAL SPEAKER DIARIZATION FOR A VARIABLE NUMBER OF SPEAKERS
Han, Eunjung
Lee, Chul
Stolcke, Andreas
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7193 - 7197
[26] TRANSCRIBE-TO-DIARIZE: NEURAL SPEAKER DIARIZATION FOR UNLIMITED NUMBER OF SPEAKERS USING END-TO-END SPEAKER-ATTRIBUTED ASR
Kanda, Naoyuki
Xiao, Xiong
Gaur, Yashesh
Wang, Xiaofei
Meng, Zhong
Chen, Zhuo
Yoshioka, Takuya
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8082 - 8086
[27] END-TO-END DIARIZATION FOR VARIABLE NUMBER OF SPEAKERS WITH LOCAL-GLOBAL NETWORKS AND DISCRIMINATIVE SPEAKER EMBEDDINGS
Maiti, Soumi
Erdogan, Hakan
Wilson, Kevin
Wisdom, Scott
Watanabe, Shinji
Hershey, John R.
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7183 - 7187
[28] ASR-AWARE END-TO-END NEURAL DIARIZATION
Khare, Aparna
Han, Eunjung
Yang, Yuguang
Stolcke, Andreas
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8092 - 8096
[29] End-to-End Chinese Speaker Identification
Yu, Dian
Zhou, Ben
Yu, Dong
[J]. NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 2274 - 2285
[30] End-to-End Active Speaker Detection
Alcazar, Juan Leon
Cordes, Moritz
Zhao, Chen
Ghanem, Bernard
[J]. COMPUTER VISION, ECCV 2022, PT XXXVII, 2022, 13697 : 126 - 143

← 1 2 3 4 5 →