LOW-LATENCY SPEECH SEPARATION GUIDED DIARIZATION FOR TELEPHONE CONVERSATIONS

被引:2
|
作者
Morrone, Giovanni [1 ]
Cornell, Samuele [1 ]
Raj, Desh [2 ]
Serafini, Luca [1 ]
Zovato, Enrico [3 ]
Brutti, Alessio [4 ]
Squartini, Stefano [1 ]
机构
[1] Univ Politecn Marche, Ancona, Italy
[2] Johns Hopkins Univ, Baltimore, MD USA
[3] PerVoice S p A, Trento, Italy
[4] Fondazione Bruno Kessler, Trento, Italy
关键词
online speaker diarization; speech separation; overlapped speech; deep learning; conversational telephone speech; SPEAKER DIARIZATION;
D O I
10.1109/SLT54892.2023.10023280
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we carry out an analysis on the use of speech separation guided diarization (SSGD) in telephone conversations. SSGD performs diarization by separating the speakers signals and then applying voice activity detection on each estimated speaker signal. In particular, we compare two low-latency speech separation models. Moreover, we show a post-processing algorithm that significantly reduces the false alarm errors of a SSGD pipeline. We perform our experiments on two datasets: Fisher Corpus Part 1 and CALLHOME, evaluating both separation and diarization metrics. Notably, our SSGD DPRNN-based online model achieves 11.1% DER on CALLHOME, comparable with most state-of-the-art end-to-end neural diarization models despite being trained on an order of magnitude less data and having considerably lower latency, i.e., 0.1 vs. 10 seconds. We also show that the separated signals can be readily fed to a speech recognition back-end with performance close to the oracle source signals.
引用
收藏
页码:641 / 646
页数:6
相关论文
共 50 条
  • [31] Speech Recognition and Multi-Speaker Diarization of Long Conversations
    Mao, Huanru Henry
    Li, Shuyang
    McAuley, Julian
    Cottrell, Garrison W.
    [J]. INTERSPEECH 2020, 2020, : 691 - 695
  • [32] Neural Network Speaker Descriptor in Speaker Diarization of Telephone Speech
    Zajic, Zbynek
    Zelinka, Jan
    Mueller, Ludek
    [J]. SPEECH AND COMPUTER, SPECOM 2017, 2017, 10458 : 555 - 563
  • [33] OVERLAP-AWARE LOW-LATENCY ONLINE SPEAKER DIARIZATION BASED ON END-TO-END LOCAL SEGMENTATION
    Coria, Juan M.
    Bredin, Herve
    Ghannay, Sahar
    Rosset, Sophie
    [J]. 2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 1139 - 1146
  • [34] EXPLOITING THE INTERMITTENCY OF SPEECH FOR JOINT SEPARATION AND DIARIZATION
    Kounades-Bastian, Dionyssos
    Girin, Laurent
    Alameda-Pineda, Xavier
    Horaud, Radu
    Gannot, Sharon
    [J]. 2017 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2017, : 41 - 45
  • [35] Orthros: A Low-Latency PRF
    Banik, Subhadeep
    Isobe, Takanori
    Liu, Fukang
    Minematsu, Kazuhiko
    Sakamoto, Kosei
    [J]. IACR TRANSACTIONS ON SYMMETRIC CRYPTOLOGY, 2021, 2021 (01) : 37 - 77
  • [36] Low-latency monaural speech enhancement with deep filter-bank equalizer
    Zheng, Chengshi
    Liu, Wenzhe
    Li, Andong
    Ke, Yuxuan
    Li, Xiaodong
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2022, 151 (05): : 3291 - 3304
  • [37] Low-latency query compilation
    Henning Funke
    Jan Mühlig
    Jens Teubner
    [J]. The VLDB Journal, 2022, 31 : 1171 - 1184
  • [38] Low-latency query compilation
    Funke, Henning
    Muehlig, Jan
    Teubner, Jens
    [J]. VLDB JOURNAL, 2022, 31 (06): : 1171 - 1184
  • [39] Randomization Effect on Iterative-Based Speaker Diarization System for Telephone Conversations
    Furmanov, Tal
    Aminov, Lidiya
    Moyal, Ami
    Lapidot, Itshak
    [J]. 2014 IEEE 28TH CONVENTION OF ELECTRICAL & ELECTRONICS ENGINEERS IN ISRAEL (IEEEI), 2014,
  • [40] VARIATIONAL BAYES BASED I-VECTOR FOR SPEAKER DIARIZATION OF TELEPHONE CONVERSATIONS
    Zheng, Rong
    Zhang, Ce
    Zhang, Shanshan
    Xu, Bo
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,