Self-Distillation into Self-Attention Heads for Improving Transformer-based End-to-End Neural Speaker Diarization

被引:1
|
作者
Jeoung, Ye-Rin [1 ]
Choi, Jeong-Hwan [1 ]
Seong, Ju-Seok [1 ]
Kyung, JeHyun [1 ]
Chang, Joon-Hyuk [1 ]
机构
[1] Hanyang Univ, Dept Elect Engn, Seoul, South Korea
来源
关键词
speaker diarization; end-to-end neural diarization; self-attention mechanism; fine-tuning; self-distillation;
D O I
10.21437/Interspeech.2023-1404
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this study, we explore self-distillation (SD) techniques to improve the performance of the transformer-encoder-based selfattentive (SA) end-to-end neural speaker diarization (EEND). We first apply the SD approaches, introduced in the automatic speech recognition field, to the SA-EEND model to confirm their potential for speaker diarization. Then, we propose two novel SD methods for the SA-EEND, which distill the prediction output of the model or the SA heads of the upper blocks into the SA heads of the lower blocks. Consequently, we expect the high-level speaker-discriminative knowledge learned by the upper blocks to be shared across the lower blocks, thereby enabling the SA heads of the lower blocks to effectively capture the discriminative patterns of overlapped speech of multiple speakers. Experimental results on the simulated and CALL-HOME datasets show that the SD generally improves the baseline performance, and the proposed methods outperform the conventional SD approaches.
引用
收藏
页码:3197 / 3201
页数:5
相关论文
共 50 条
  • [21] IMPROVING MANDARIN END-TO-END SPEECH SYNTHESIS BY SELF-ATTENTION AND LEARNABLE GAUSSIAN BIAS
    Yang, Fengyu
    Yang, Shan
    Zhu, Pengcheng
    Yan, Pengju
    Xie, Lei
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 208 - 213
  • [22] END-TO-END SPEECH SUMMARIZATION USING RESTRICTED SELF-ATTENTION
    Sharma, Roshan
    Palaskar, Shruti
    Black, Alan W.
    Metze, Florian
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8072 - 8076
  • [23] Efficient decoding self-attention for end-to-end speech synthesis
    Zhao, Wei
    Xu, Li
    FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2022, 23 (07) : 1127 - 1138
  • [24] End-to-End Learning for Video Frame Compression with Self-Attention
    Zou, Nannan
    Zhang, Honglei
    Cricri, Francesco
    Tavakoli, Hamed R.
    Lainema, Jani
    Aksu, Emre
    Hannuksela, Miska
    Rahtu, Esa
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020), 2020, : 580 - 584
  • [25] End-to-End Neural Speaker Diarization with an Iterative Refinement of Non-Autoregressive Attention-based Attractors
    Rybicka, Magdalena
    Villalba, Jesus
    Dehak, Najim
    Kowalczyk, Konrad
    INTERSPEECH 2022, 2022, : 5090 - 5094
  • [26] Efficient Semantic Segmentation via Self-Attention and Self-Distillation
    An, Shumin
    Liao, Qingmin
    Lu, Zongqing
    Xue, Jing-Hao
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (09) : 15256 - 15266
  • [27] End-to-End Neural Speaker Diarization With Non-Autoregressive Attractors
    Rybicka, Magdalena
    Villalba, Jesus
    Thebaud, Thomas
    Dehak, Najim
    Kowalczyk, Konrad
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 3960 - 3973
  • [28] End-To-End Neural Speaker Diarization Through Step-Function
    Latypov, Rustam
    Stolov, Evgeni
    2021 IEEE 15TH INTERNATIONAL CONFERENCE ON APPLICATION OF INFORMATION AND COMMUNICATION TECHNOLOGIES (AICT2021), 2021,
  • [29] End-to-End Neural Speaker Diarization with Permutation-Free Objectives
    Fujita, Yusuke
    Kanda, Naoyuki
    Horiguchi, Shota
    Nagamatsu, Kenji
    Watanabe, Shinji
    INTERSPEECH 2019, 2019, : 4300 - 4304
  • [30] End-to-end neural speaker diarization with an iterative adaptive attractor estimation
    Hao, Fengyuan
    Li, Xiaodong
    Zheng, Chengshi
    NEURAL NETWORKS, 2023, 166 : 566 - 578