Self-Distillation into Self-Attention Heads for Improving Transformer-based End-to-End Neural Speaker Diarization

被引：1

作者：

Jeoung, Ye-Rin ^{[1
]}

Choi, Jeong-Hwan ^{[1
]}

Seong, Ju-Seok ^{[1
]}

Kyung, JeHyun ^{[1
]}

Chang, Joon-Hyuk ^{[1
]}

机构：

[1] Hanyang Univ, Dept Elect Engn, Seoul, South Korea

来源：

INTERSPEECH 2023 | 2023年

关键词：

speaker diarization; end-to-end neural diarization; self-attention mechanism; fine-tuning; self-distillation;

D O I：

10.21437/Interspeech.2023-1404

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this study, we explore self-distillation (SD) techniques to improve the performance of the transformer-encoder-based selfattentive (SA) end-to-end neural speaker diarization (EEND). We first apply the SD approaches, introduced in the automatic speech recognition field, to the SA-EEND model to confirm their potential for speaker diarization. Then, we propose two novel SD methods for the SA-EEND, which distill the prediction output of the model or the SA heads of the upper blocks into the SA heads of the lower blocks. Consequently, we expect the high-level speaker-discriminative knowledge learned by the upper blocks to be shared across the lower blocks, thereby enabling the SA heads of the lower blocks to effectively capture the discriminative patterns of overlapped speech of multiple speakers. Experimental results on the simulated and CALL-HOME datasets show that the SD generally improves the baseline performance, and the proposed methods outperform the conventional SD approaches.

引用

页码：3197 / 3201

页数：5

共 50 条

[31] ONLINE END-TO-END NEURAL DIARIZATION WITH SPEAKER-TRACING BUFFER
Xue, Yawen
Horiguchi, Shota
Fujita, Yusuke
Watanabe, Shinji
Garcia, Paola
Nagamatsu, Kenji
2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 841 - 848
[32] ESTMST-ST: An End-to-End Soft Threshold and Multiloss Self-Distillation Based Swin Transformer for Underwater Acoustic Signal Recognition
Wu, Fan
Yao, Haiyang
Zhao, Zhongda
Zhao, Xiaobo
Zang, Yuzhang
Wang, Haiyan
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2025, 63
[33] Very Deep Self-Attention Networks for End-to-End Speech Recognition
Ngoc-Quan Pham
Thai-Son Nguyen
Niehues, Jan
Mueller, Markus
Waibel, Alex
INTERSPEECH 2019, 2019, : 66 - 70
[34] A Novel End-to-End Corporate Credit Rating Model Based on Self-Attention Mechanism
Chen, Binbin
Long, Shengjie
IEEE ACCESS, 2020, 8 (08): : 203876 - 203889
[35] Speaker diarization with variants of self-attention and joint speaker embedding extractor
Fu, Pengbin
Ma, Yuchen
Yang, Huirong
JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 45 (05) : 9169 - 9180
[36] Self-Distillation Based on High-level Information Supervision for Compressing End-to-End ASR Model
Xu, Qiang
Song, Tongtong
Wang, Longbiao
Shi, Hao
Lin, Ynqin
Lv, Yongjie
Ge, Meng
Yu, Qiang
Dang, Jianwu
INTERSPEECH 2022, 2022, : 1716 - 1720
[37] A Transformer-Based Model With Self-Distillation for Multimodal Emotion Recognition in Conversations
Ma, Hui
Wang, Jian
Lin, Hongfei
Zhang, Bo
Zhang, Yijia
Xu, Bo
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 776 - 788
[38] TRANSFORMER-BASED END-TO-END SPEECH RECOGNITION WITH LOCAL DENSE SYNTHESIZER ATTENTION
Xu, Menglong
Li, Shengqiang
Zhang, Xiao-Lei
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5899 - 5903
[39] Transformer-based end-to-end attack on text CAPTCHAs with triplet deep attention
Zhang, Bo
Xiong, Yu-Jie
Xia, Chunming
Gao, Yongbin
COMPUTERS & SECURITY, 2024, 146
[40] TRANSFORMER-BASED ONLINE CTC/ATTENTION END-TO-END SPEECH RECOGNITION ARCHITECTURE
Miao, Haoran
Cheng, Gaofeng
Gao, Changfeng
Zhang, Pengyuan
Yan, Yonghong
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6084 - 6088

← 1 2 3 4 5 →