ONLINE DEEP ATTRACTOR NETWORK FOR REAL-TIME SINGLE-CHANNEL SPEECH SEPARATION

被引:0
|
作者
Han, Cong [1 ]
Luo, Yi [1 ]
Mesgarani, Nima [1 ]
机构
[1] Columbia Univ, Dept Elect Engn, New York, NY 10027 USA
基金
美国国家科学基金会;
关键词
Source separation; speaker-independent; attractor network; real-time;
D O I
10.1109/icassp.2019.8682884
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speaker-independent speech separation is a challenging audio processing problem. In recent years, several deep learning algorithms have been proposed to address this problem. The majority of these methods use noncausal implementation which limits their application in real-time scenarios such as in wearable hearing devices and low-latency telecommunication. In this paper, we propose the Online Deep Attractor Network (ODANet), an extension to the Deep Attractor Network (DANet) which is causal and enables real-time speech separation. In contrast with DANet that estimates the global attractor point for each speaker using the entire utterance, ODANet estimates the attractors for each time step and tracks them using a dynamic weighting function with only causal information. This not only solves the speaker tracking problem, but also allows ODANet to generate more stable embeddings across time. Experimental results show that ODANet can achieve a similar separation accuracy as the noncausal DANet in both two speaker and three speaker speech separation problems, which makes it a suitable candidate for applications that require robust real-time speech processing.
引用
收藏
页码:361 / 365
页数:5
相关论文
共 50 条
  • [21] NOISE-ADAPTIVE DEEP NEURAL NETWORK FOR SINGLE-CHANNEL SPEECH ENHANCEMENT
    Chung, Hanwook
    Kim, Taesup
    Plourde, Eric
    Champagne, Benoit
    [J]. 2018 IEEE 28TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2018,
  • [22] JOINT SINGLE-CHANNEL SPEECH SEPARATION AND SPEAKER IDENTIFICATION
    Mowlaee, P.
    Saeidi, R.
    Tan, Z. -H.
    Christensen, M. G.
    Franti, P.
    Jensen, S. H.
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4430 - 4433
  • [23] WHAMR!: NOISY AND REVERBERANT SINGLE-CHANNEL SPEECH SEPARATION
    Maciejewski, Matthew
    Wichern, Gordon
    McQuinn, Emmett
    Le Roux, Jonathan
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 696 - 700
  • [24] Learning a Discriminative Dictionary for Single-Channel Speech Separation
    Bao, Guangzhao
    Xu, Yangfei
    Ye, Zhongfu
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (07) : 1130 - 1138
  • [25] LEARNING A HIERARCHICAL DICTIONARY FOR SINGLE-CHANNEL SPEECH SEPARATION
    Bao, Guangzhao
    Xu, Yangfei
    Xu, Xu
    Ye, Zhongfu
    [J]. 2014 IEEE WORKSHOP ON STATISTICAL SIGNAL PROCESSING (SSP), 2014, : 476 - 479
  • [26] Improved Phase Reconstruction in Single-Channel Speech Separation
    Mayer, Florian
    Mowlaee, Pejman
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1795 - 1799
  • [27] Single-Channel Speech Separation Focusing on Attention DE
    Li, Xinshu
    Tan, Zhenhua
    Xia, Zhenche
    Wu, Danke
    Zhang, Bin
    [J]. 2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 3204 - 3209
  • [28] Optimum Mixture Estimator for single-channel Speech Separation
    Mowlaee, Pejman
    Sayadiyan, Abolghassem
    Sheikhan, Mansour
    [J]. 2008 INTERNATIONAL SYMPOSIUM ON TELECOMMUNICATIONS, VOLS 1 AND 2, 2008, : 543 - +
  • [29] Single-channel speech separation based on modulation frequency
    Gu, Lingyun
    Stern, Richard M.
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 25 - 28
  • [30] Improving Deep Attractor Network by BGRU and GMM for Speech Separation
    Rawad Melhem
    Assef Jafar
    Riad Hamadeh
    [J]. Journal of Harbin Institute of Technology(New series), 2021, 28 (03) : 90 - 96