Self-Distillation into Self-Attention Heads for Improving Transformer-based End-to-End Neural Speaker Diarization

被引:1
|
作者
Jeoung, Ye-Rin [1 ]
Choi, Jeong-Hwan [1 ]
Seong, Ju-Seok [1 ]
Kyung, JeHyun [1 ]
Chang, Joon-Hyuk [1 ]
机构
[1] Hanyang Univ, Dept Elect Engn, Seoul, South Korea
来源
关键词
speaker diarization; end-to-end neural diarization; self-attention mechanism; fine-tuning; self-distillation;
D O I
10.21437/Interspeech.2023-1404
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this study, we explore self-distillation (SD) techniques to improve the performance of the transformer-encoder-based selfattentive (SA) end-to-end neural speaker diarization (EEND). We first apply the SD approaches, introduced in the automatic speech recognition field, to the SA-EEND model to confirm their potential for speaker diarization. Then, we propose two novel SD methods for the SA-EEND, which distill the prediction output of the model or the SA heads of the upper blocks into the SA heads of the lower blocks. Consequently, we expect the high-level speaker-discriminative knowledge learned by the upper blocks to be shared across the lower blocks, thereby enabling the SA heads of the lower blocks to effectively capture the discriminative patterns of overlapped speech of multiple speakers. Experimental results on the simulated and CALL-HOME datasets show that the SD generally improves the baseline performance, and the proposed methods outperform the conventional SD approaches.
引用
收藏
页码:3197 / 3201
页数:5
相关论文
共 50 条
  • [1] END-TO-END NEURAL SPEAKER DIARIZATION WITH SELF-ATTENTION
    Fujita, Yusuke
    Kanda, Naoyuki
    Horiguchi, Shota
    Xue, Yawen
    Nagamatsu, Kenji
    Watanabe, Shinji
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 296 - 303
  • [2] SIMPLIFIED SELF-ATTENTION FOR TRANSFORMER-BASED END-TO-END SPEECH RECOGNITION
    Luo, Haoneng
    Zhang, Shiliang
    Lei, Ming
    Xie, Lei
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 75 - 81
  • [3] Transformer-based end-to-end speech recognition with residual Gaussian-based self-attention
    Liang, Chengdong
    Xu, Menglong
    Zhang, Xiao-Lei
    Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2021, 2 : 1495 - 1499
  • [4] CASA-Net: Cross-attention and Self-attention for End-to-End Audio-visual Speaker Diarization
    Zhou, Haodong
    Li, Tao
    Wang, Jie
    Li, Lin
    Hong, Qingyang
    2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 102 - 106
  • [5] Self-Conditioning via Intermediate Predictions for End-to-End Neural Speaker Diarization
    Fujita, Yusuke
    Ogawa, Tetsuji
    Kobayashi, Tetsunori
    IEEE ACCESS, 2023, 11 : 140069 - 140076
  • [6] KNOWLEDGE DISTILLATION USING OUTPUT ERRORS FOR SELF-ATTENTION END-TO-END MODELS
    Kim, Ho-Gyeong
    Na, Hwidong
    Lee, Hoshik
    Lee, Jihyun
    Kang, Tae Gyoon
    Lee, Min-Joong
    Choi, Young Sang
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6181 - 6185
  • [7] End-to-End Neural Speaker Diarization with Absolute Speaker Loss
    Wang, Chao
    Li, Jie
    Fang, Xiang
    Kang, Jian
    Li, Yongxiang
    INTERSPEECH 2023, 2023, : 3577 - 3581
  • [8] An Improved End-to-End Multi-Target Tracking Method Based on Transformer Self-Attention
    Hong, Yong
    Li, Deren
    Luo, Shupei
    Chen, Xin
    Yang, Yi
    Wang, Mi
    REMOTE SENSING, 2022, 14 (24)
  • [9] End-to-End ASR with Adaptive Span Self-Attention
    Chang, Xuankai
    Subramanian, Aswin Shanmugam
    Guo, Pengcheng
    Watanabe, Shinji
    Fujita, Yuya
    Omachi, Motoi
    INTERSPEECH 2020, 2020, : 3595 - 3599
  • [10] Self-Attention Transducers for End-to-End Speech Recognition
    Tian, Zhengkun
    Yi, Jiangyan
    Tao, Jianhua
    Bai, Ye
    Wen, Zhengqi
    INTERSPEECH 2019, 2019, : 4395 - 4399