ONLINE DEEP ATTRACTOR NETWORK FOR REAL-TIME SINGLE-CHANNEL SPEECH SEPARATION

被引:0
|
作者
Han, Cong [1 ]
Luo, Yi [1 ]
Mesgarani, Nima [1 ]
机构
[1] Columbia Univ, Dept Elect Engn, New York, NY 10027 USA
基金
美国国家科学基金会;
关键词
Source separation; speaker-independent; attractor network; real-time;
D O I
10.1109/icassp.2019.8682884
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speaker-independent speech separation is a challenging audio processing problem. In recent years, several deep learning algorithms have been proposed to address this problem. The majority of these methods use noncausal implementation which limits their application in real-time scenarios such as in wearable hearing devices and low-latency telecommunication. In this paper, we propose the Online Deep Attractor Network (ODANet), an extension to the Deep Attractor Network (DANet) which is causal and enables real-time speech separation. In contrast with DANet that estimates the global attractor point for each speaker using the entire utterance, ODANet estimates the attractors for each time step and tracks them using a dynamic weighting function with only causal information. This not only solves the speaker tracking problem, but also allows ODANet to generate more stable embeddings across time. Experimental results show that ODANet can achieve a similar separation accuracy as the noncausal DANet in both two speaker and three speaker speech separation problems, which makes it a suitable candidate for applications that require robust real-time speech processing.
引用
收藏
页码:361 / 365
页数:5
相关论文
共 50 条
  • [31] Improving Deep Attractor Network by BGRU and GMM for Speech Separation
    Rawad Melhem
    Assef Jafar
    Riad Hamadeh
    [J]. Journal of Harbin Institute of Technology(New series), 2021, 28 (03) : 90 - 96
  • [32] SLEEPINCEPTIONNET: A DEEP LEARNING ALGORITHM FOR REAL-TIME SLEEP STAGES SCORING USING SINGLE-CHANNEL EEG
    Haghayegh, Shahab
    Hu, Kun
    Stone, Katie
    Redline, Susan
    Schernhammer, Eva
    [J]. SLEEP, 2022, 45 : A38 - A39
  • [33] Deep Domain Adaptation Enhances Amplification Curve Analysis for Single-Channel Multiplexing in Real-Time PCR
    Mao, Ye
    Xu, Ke
    Miglietta, Luca
    Kreitmann, Louis
    Moser, Nicolas
    Georgiou, Pantelis
    Holmes, Alison
    Rodriguez-Manzano, Jesus
    [J]. IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2023, 27 (06) : 3093 - 3103
  • [34] PHASE RECONSTRUCTION WITH LEARNED TIME-FREQUENCY REPRESENTATIONS FOR SINGLE-CHANNEL SPEECH SEPARATION
    Wichern, Gordon
    Le Roux, Jonathan
    [J]. 2018 16TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2018, : 396 - 400
  • [35] A Single-channel Speech Enhancement Approach Based on Perceptual Masking Deep Neural Network
    [J]. Zhang, Xiong-Wei (xwzhang9898@163.com), 2017, Science Press (43):
  • [36] CompNet: Complementary network for single-channel speech enhancement
    Fan, Cunhang
    Zhang, Hongmei
    Li, Andong
    Xiang, Wang
    Zheng, Chengshi
    Lv, Zhao
    Wu, Xiaopei
    [J]. NEURAL NETWORKS, 2023, 168 : 508 - 517
  • [37] TOWARDS REAL-TIME SINGLE-CHANNEL SINGING-VOICE SEPARATION WITH PRUNED MULTI-SCALED DENSENETS
    Huber, Markus
    Schindler, Gunther
    Roth, Wolfgang
    Froning, Holger
    Schorkhuber, Christian
    Pernkopf, Franz
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 806 - 810
  • [38] Single-Channel Online Enhancement of Speech Corrupted by Reverberation and Noise
    Doire, Clement S. J.
    Brookes, Mike
    Naylor, Patrick A.
    Hicks, Christopher M.
    Betts, Dave
    Dmour, Mohammad A.
    Jensen, Soren Holdt
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (03) : 572 - 587
  • [39] IMPROVED SINGLE-CHANNEL SPEECH SEPARATION USING SINUSOIDAL MODELING
    Mowlaee, Pejman
    Christensen, Mads Graesboll
    Jensen, Soren Holdt
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 21 - 24
  • [40] SPEAKER AND NOISE INDEPENDENT ONLINE SINGLE-CHANNEL SPEECH ENHANCEMENT
    Germain, Francois G.
    Mysore, Gautham J.
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 71 - 75