LOW-LATENCY SPEECH SEPARATION GUIDED DIARIZATION FOR TELEPHONE CONVERSATIONS

被引:2
|
作者
Morrone, Giovanni [1 ]
Cornell, Samuele [1 ]
Raj, Desh [2 ]
Serafini, Luca [1 ]
Zovato, Enrico [3 ]
Brutti, Alessio [4 ]
Squartini, Stefano [1 ]
机构
[1] Univ Politecn Marche, Ancona, Italy
[2] Johns Hopkins Univ, Baltimore, MD USA
[3] PerVoice S p A, Trento, Italy
[4] Fondazione Bruno Kessler, Trento, Italy
关键词
online speaker diarization; speech separation; overlapped speech; deep learning; conversational telephone speech; SPEAKER DIARIZATION;
D O I
10.1109/SLT54892.2023.10023280
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we carry out an analysis on the use of speech separation guided diarization (SSGD) in telephone conversations. SSGD performs diarization by separating the speakers signals and then applying voice activity detection on each estimated speaker signal. In particular, we compare two low-latency speech separation models. Moreover, we show a post-processing algorithm that significantly reduces the false alarm errors of a SSGD pipeline. We perform our experiments on two datasets: Fisher Corpus Part 1 and CALLHOME, evaluating both separation and diarization metrics. Notably, our SSGD DPRNN-based online model achieves 11.1% DER on CALLHOME, comparable with most state-of-the-art end-to-end neural diarization models despite being trained on an order of magnitude less data and having considerably lower latency, i.e., 0.1 vs. 10 seconds. We also show that the separated signals can be readily fed to a speech recognition back-end with performance close to the oracle source signals.
引用
收藏
页码:641 / 646
页数:6
相关论文
共 50 条
  • [41] Low-Latency Scheduling in MPTCP
    Hurtig, Per
    Grinnemo, Karl-Johan
    Brunstrom, Anna
    Ferlin, Simone
    Alay, Ozgu
    Kuhn, Nicolas
    [J]. IEEE-ACM TRANSACTIONS ON NETWORKING, 2019, 27 (01) : 302 - 315
  • [42] VARIATIONAL BAYES BASED I-VECTOR FOR SPEAKER DIARIZATION OF TELEPHONE CONVERSATIONS
    Zheng, Rong
    Zhang, Ce
    Zhang, Shanshan
    Xu, Bo
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [43] Low-Latency Handshake Join
    Roy, Pratanu
    Teubner, Jens
    Gemulla, Rainer
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2014, 7 (09): : 709 - 720
  • [44] Binaural Multichannel Blind Speaker Separation With a Causal Low-Latency and Low-Complexity Approach
    Westhausen, Nils L.
    Meyer, Bernd T.
    [J]. IEEE OPEN JOURNAL OF SIGNAL PROCESSING, 2024, 5 : 238 - 247
  • [45] Implementation of low-latency electrolaryngeal speech enhancement based on multi-task CLDNN
    Kobayashi, Kazuhiro
    Toda, Tomoki
    [J]. 28TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2020), 2021, : 396 - 400
  • [46] Low-Latency Sequence-to-Sequence Speech Recognition and Translation by Partial Hypothesis Selection
    Liu, Danni
    Spanakis, Gerasimos
    Niehues, Jan
    [J]. INTERSPEECH 2020, 2020, : 3620 - 3624
  • [47] LOW-LATENCY INCREMENTAL TEXT-TO-SPEECH SYNTHESIS WITH DISTILLED CONTEXT PREDICTION NETWORK
    Saeki, Takaaki
    Takamichi, Shinnosuke
    Saruwatari, Hiroshi
    [J]. 2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 749 - 756
  • [48] Combining gaussianized/non-gaussianized features to improve speaker diarization of telephone conversations
    Gupta, Vishwa
    Kenny, Patrick
    Ouellet, Pierre
    Boulianne, Gilles
    Dumouchel, Pierre
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2007, 14 (12) : 1040 - 1043
  • [49] Low-Latency Modular Exponentiation for FPGAs
    Langhammer, Martin
    Gribok, Sergey
    Pasca, Bogdan
    [J]. 2022 IEEE 30TH INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2022), 2022, : 191 - 199
  • [50] Low-latency pipelined crossbar arbitration
    Minkenberg, C
    Iliadis, I
    Abel, F
    [J]. GLOBECOM '04: IEEE GLOBAL TELECOMMUNICATIONS CONFERENCE, VOLS 1-6, 2004, : 1174 - 1179