ONLINE END-TO-END NEURAL DIARIZATION WITH SPEAKER-TRACING BUFFER

被引:22
|
作者
Xue, Yawen [1 ]
Horiguchi, Shota [1 ]
Fujita, Yusuke [1 ]
Watanabe, Shinji [2 ]
Garcia, Paola [2 ]
Nagamatsu, Kenji [1 ]
机构
[1] Hitachi Ltd, Res & Dev Grp, Tokyo, Japan
[2] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD 21218 USA
关键词
Online speaker diarization; speaker-tracing buffer; end-to-end; self-attention;
D O I
10.1109/SLT48900.2021.9383523
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes a novel online speaker diarization algorithm based on a fully supervised self-attention mechanism (SA-EEND). Online diarization inherently presents a speaker's permutation problem due to the possibility to assign speaker regions incorrectly across the recording. To circumvent this inconsistency, we proposed a speaker-tracing buffer mechanism that selects several input frames representing the speaker permutation information from previous chunks and stores them in a buffer. These buffered frames are stacked with the input frames in the current chunk and fed into a self-attention network. Our method ensures consistent diarization outputs across the buffer and the current chunk by checking the correlation between their corresponding outputs. Additionally, we trained SA-EEND with variable chunk-sizes to mitigate the mismatch between training and inference introduced by the speaker-tracing buffer mechanism. Experimental results, including online SA-EEND and variable chunk-size, achieved DERs of 12:54% for CALLHOME and 20:77% for CSJ with 1:4 s actual latency.
引用
收藏
页码:841 / 848
页数:8
相关论文
共 50 条
  • [1] TOWARDS END-TO-END SPEAKER DIARIZATION WITH GENERALIZED NEURAL SPEAKER CLUSTERING
    Zhang, Chunlei
    Shi, Jiatong
    Weng, Chao
    Yu, Meng
    Yu, Dong
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8372 - 8376
  • [2] END-TO-END NEURAL SPEAKER DIARIZATION WITH SELF-ATTENTION
    Fujita, Yusuke
    Kanda, Naoyuki
    Horiguchi, Shota
    Xue, Yawen
    Nagamatsu, Kenji
    Watanabe, Shinji
    [J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 296 - 303
  • [3] End-to-End Audio-Visual Neural Speaker Diarization
    He, Mao-kui
    Du, Jun
    Lee, Chin-Hui
    [J]. INTERSPEECH 2022, 2022, : 1461 - 1465
  • [4] Robust End-to-end Speaker Diarization with Generic Neural Clustering
    Yang, Chenyu
    Wang, Yu
    [J]. INTERSPEECH 2022, 2022, : 1471 - 1475
  • [5] End-to-End Neural Speaker Diarization With Non-Autoregressive Attractors
    Rybicka, Magdalena
    Villalba, Jesus
    Thebaud, Thomas
    Dehak, Najim
    Kowalczyk, Konrad
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 3960 - 3973
  • [6] End-To-End Neural Speaker Diarization Through Step-Function
    Latypov, Rustam
    Stolov, Evgeni
    [J]. 2021 IEEE 15TH INTERNATIONAL CONFERENCE ON APPLICATION OF INFORMATION AND COMMUNICATION TECHNOLOGIES (AICT2021), 2021,
  • [7] End-to-End Neural Speaker Diarization with Permutation-Free Objectives
    Fujita, Yusuke
    Kanda, Naoyuki
    Horiguchi, Shota
    Nagamatsu, Kenji
    Watanabe, Shinji
    [J]. INTERSPEECH 2019, 2019, : 4300 - 4304
  • [8] End-to-end neural speaker diarization with an iterative adaptive attractor estimation
    Hao, Fengyuan
    Li, Xiaodong
    Zheng, Chengshi
    [J]. NEURAL NETWORKS, 2023, 166 : 566 - 578
  • [9] END-TO-END SPEAKER DIARIZATION AS POST-PROCESSING
    Horiguchi, Shota
    Garcia, Paola
    Fujita, Yusuke
    Watanabe, Shinji
    Nagamatsu, Kenji
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7188 - 7192
  • [10] Self-Conditioning via Intermediate Predictions for End-to-End Neural Speaker Diarization
    Fujita, Yusuke
    Ogawa, Tetsuji
    Kobayashi, Tetsunori
    [J]. IEEE ACCESS, 2023, 11 (140069-140076) : 140069 - 140076