ONLINE END-TO-END NEURAL DIARIZATION WITH SPEAKER-TRACING BUFFER

被引：22

作者：

Xue, Yawen ^{[1
]}

Horiguchi, Shota ^{[1
]}

Fujita, Yusuke ^{[1
]}

Watanabe, Shinji ^{[2
]}

Garcia, Paola ^{[2
]}

Nagamatsu, Kenji ^{[1
]}

机构：

[1] Hitachi Ltd, Res & Dev Grp, Tokyo, Japan

[2] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD 21218 USA

来源：

2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT) | 2021年

关键词：

Online speaker diarization; speaker-tracing buffer; end-to-end; self-attention;

D O I：

10.1109/SLT48900.2021.9383523

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper proposes a novel online speaker diarization algorithm based on a fully supervised self-attention mechanism (SA-EEND). Online diarization inherently presents a speaker's permutation problem due to the possibility to assign speaker regions incorrectly across the recording. To circumvent this inconsistency, we proposed a speaker-tracing buffer mechanism that selects several input frames representing the speaker permutation information from previous chunks and stores them in a buffer. These buffered frames are stacked with the input frames in the current chunk and fed into a self-attention network. Our method ensures consistent diarization outputs across the buffer and the current chunk by checking the correlation between their corresponding outputs. Additionally, we trained SA-EEND with variable chunk-sizes to mitigate the mismatch between training and inference introduced by the speaker-tracing buffer mechanism. Experimental results, including online SA-EEND and variable chunk-size, achieved DERs of 12:54% for CALLHOME and 20:77% for CSJ with 1:4 s actual latency.

引用

页码：841 / 848

页数：8

共 50 条

[1] TOWARDS END-TO-END SPEAKER DIARIZATION WITH GENERALIZED NEURAL SPEAKER CLUSTERING
Zhang, Chunlei
Shi, Jiatong
Weng, Chao
Yu, Meng
Yu, Dong
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8372 - 8376
[2] END-TO-END NEURAL SPEAKER DIARIZATION WITH SELF-ATTENTION
Fujita, Yusuke
Kanda, Naoyuki
Horiguchi, Shota
Xue, Yawen
Nagamatsu, Kenji
Watanabe, Shinji
[J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 296 - 303
[3] End-to-End Audio-Visual Neural Speaker Diarization
He, Mao-kui
Du, Jun
Lee, Chin-Hui
[J]. INTERSPEECH 2022, 2022, : 1461 - 1465
[4] Robust End-to-end Speaker Diarization with Generic Neural Clustering
Yang, Chenyu
Wang, Yu
[J]. INTERSPEECH 2022, 2022, : 1471 - 1475
[5] End-to-End Neural Speaker Diarization With Non-Autoregressive Attractors
Rybicka, Magdalena
Villalba, Jesus
Thebaud, Thomas
Dehak, Najim
Kowalczyk, Konrad
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 3960 - 3973
[6] End-To-End Neural Speaker Diarization Through Step-Function
Latypov, Rustam
Stolov, Evgeni
[J]. 2021 IEEE 15TH INTERNATIONAL CONFERENCE ON APPLICATION OF INFORMATION AND COMMUNICATION TECHNOLOGIES (AICT2021), 2021,
[7] End-to-End Neural Speaker Diarization with Permutation-Free Objectives
Fujita, Yusuke
Kanda, Naoyuki
Horiguchi, Shota
Nagamatsu, Kenji
Watanabe, Shinji
[J]. INTERSPEECH 2019, 2019, : 4300 - 4304
[8] End-to-end neural speaker diarization with an iterative adaptive attractor estimation
Hao, Fengyuan
Li, Xiaodong
Zheng, Chengshi
[J]. NEURAL NETWORKS, 2023, 166 : 566 - 578
[9] END-TO-END SPEAKER DIARIZATION AS POST-PROCESSING
Horiguchi, Shota
Garcia, Paola
Fujita, Yusuke
Watanabe, Shinji
Nagamatsu, Kenji
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7188 - 7192
[10] Self-Conditioning via Intermediate Predictions for End-to-End Neural Speaker Diarization
Fujita, Yusuke
Ogawa, Tetsuji
Kobayashi, Tetsunori
[J]. IEEE ACCESS, 2023, 11 (140069-140076) : 140069 - 140076

← 1 2 3 4 5 →