LOW-LATENCY SPEECH SEPARATION GUIDED DIARIZATION FOR TELEPHONE CONVERSATIONS

被引：2

作者：

Morrone, Giovanni ^{[1
]}

Cornell, Samuele ^{[1
]}

Raj, Desh ^{[2
]}

Serafini, Luca ^{[1
]}

Zovato, Enrico ^{[3
]}

Brutti, Alessio ^{[4
]}

Squartini, Stefano ^{[1
]}

机构：

[1] Univ Politecn Marche, Ancona, Italy

[2] Johns Hopkins Univ, Baltimore, MD USA

[3] PerVoice S p A, Trento, Italy

[4] Fondazione Bruno Kessler, Trento, Italy

来源：

2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT | 2022年

关键词：

online speaker diarization; speech separation; overlapped speech; deep learning; conversational telephone speech; SPEAKER DIARIZATION;

D O I：

10.1109/SLT54892.2023.10023280

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we carry out an analysis on the use of speech separation guided diarization (SSGD) in telephone conversations. SSGD performs diarization by separating the speakers signals and then applying voice activity detection on each estimated speaker signal. In particular, we compare two low-latency speech separation models. Moreover, we show a post-processing algorithm that significantly reduces the false alarm errors of a SSGD pipeline. We perform our experiments on two datasets: Fisher Corpus Part 1 and CALLHOME, evaluating both separation and diarization metrics. Notably, our SSGD DPRNN-based online model achieves 11.1% DER on CALLHOME, comparable with most state-of-the-art end-to-end neural diarization models despite being trained on an order of magnitude less data and having considerably lower latency, i.e., 0.1 vs. 10 seconds. We also show that the separated signals can be readily fed to a speech recognition back-end with performance close to the oracle source signals.

引用

页码：641 / 646

页数：6

共 50 条

[31] Speech Recognition and Multi-Speaker Diarization of Long Conversations
Mao, Huanru Henry
Li, Shuyang
McAuley, Julian
Cottrell, Garrison W.
[J]. INTERSPEECH 2020, 2020, : 691 - 695
[32] Neural Network Speaker Descriptor in Speaker Diarization of Telephone Speech
Zajic, Zbynek
Zelinka, Jan
Mueller, Ludek
[J]. SPEECH AND COMPUTER, SPECOM 2017, 2017, 10458 : 555 - 563
[33] OVERLAP-AWARE LOW-LATENCY ONLINE SPEAKER DIARIZATION BASED ON END-TO-END LOCAL SEGMENTATION
Coria, Juan M.
Bredin, Herve
Ghannay, Sahar
Rosset, Sophie
[J]. 2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 1139 - 1146
[34] EXPLOITING THE INTERMITTENCY OF SPEECH FOR JOINT SEPARATION AND DIARIZATION
Kounades-Bastian, Dionyssos
Girin, Laurent
Alameda-Pineda, Xavier
Horaud, Radu
Gannot, Sharon
[J]. 2017 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2017, : 41 - 45
[35] Orthros: A Low-Latency PRF
Banik, Subhadeep
Isobe, Takanori
Liu, Fukang
Minematsu, Kazuhiko
Sakamoto, Kosei
[J]. IACR TRANSACTIONS ON SYMMETRIC CRYPTOLOGY, 2021, 2021 (01) : 37 - 77
[36] Low-latency monaural speech enhancement with deep filter-bank equalizer
Zheng, Chengshi
Liu, Wenzhe
Li, Andong
Ke, Yuxuan
Li, Xiaodong
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2022, 151 (05): : 3291 - 3304
[37] Low-latency query compilation
Henning Funke
Jan Mühlig
Jens Teubner
[J]. The VLDB Journal, 2022, 31 : 1171 - 1184
[38] Low-latency query compilation
Funke, Henning
Muehlig, Jan
Teubner, Jens
[J]. VLDB JOURNAL, 2022, 31 (06): : 1171 - 1184
[39] Randomization Effect on Iterative-Based Speaker Diarization System for Telephone Conversations
Furmanov, Tal
Aminov, Lidiya
Moyal, Ami
Lapidot, Itshak
[J]. 2014 IEEE 28TH CONVENTION OF ELECTRICAL & ELECTRONICS ENGINEERS IN ISRAEL (IEEEI), 2014,
[40] VARIATIONAL BAYES BASED I-VECTOR FOR SPEAKER DIARIZATION OF TELEPHONE CONVERSATIONS
Zheng, Rong
Zhang, Ce
Zhang, Shanshan
Xu, Bo
[J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,

← 1 2 3 4 5 →