ONLINE DEEP ATTRACTOR NETWORK FOR REAL-TIME SINGLE-CHANNEL SPEECH SEPARATION

被引:0
|
作者
Han, Cong [1 ]
Luo, Yi [1 ]
Mesgarani, Nima [1 ]
机构
[1] Columbia Univ, Dept Elect Engn, New York, NY 10027 USA
基金
美国国家科学基金会;
关键词
Source separation; speaker-independent; attractor network; real-time;
D O I
10.1109/icassp.2019.8682884
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speaker-independent speech separation is a challenging audio processing problem. In recent years, several deep learning algorithms have been proposed to address this problem. The majority of these methods use noncausal implementation which limits their application in real-time scenarios such as in wearable hearing devices and low-latency telecommunication. In this paper, we propose the Online Deep Attractor Network (ODANet), an extension to the Deep Attractor Network (DANet) which is causal and enables real-time speech separation. In contrast with DANet that estimates the global attractor point for each speaker using the entire utterance, ODANet estimates the attractors for each time step and tracks them using a dynamic weighting function with only causal information. This not only solves the speaker tracking problem, but also allows ODANet to generate more stable embeddings across time. Experimental results show that ODANet can achieve a similar separation accuracy as the noncausal DANet in both two speaker and three speaker speech separation problems, which makes it a suitable candidate for applications that require robust real-time speech processing.
引用
收藏
页码:361 / 365
页数:5
相关论文
共 50 条
  • [1] TASNET: TIME-DOMAIN AUDIO SEPARATION NETWORK FOR REAL-TIME, SINGLE-CHANNEL SPEECH SEPARATION
    Luo, Yi
    Mesgarani, Nima
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 696 - 700
  • [2] Real-time Single-channel Dereverberation and Separation with Time-domain Audio Separation Network
    Luo, Yi
    Mesgarani, Nima
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 342 - 346
  • [3] Real-time single-channel deep neural network-based speech enhancement on edge devices
    Shankar, Nikhil
    Bhat, Gautam Shreedhar
    Panahi, Issa M. S.
    [J]. INTERSPEECH 2020, 2020, : 3281 - 3285
  • [4] PERFORMANCE COMPARISON OF REAL-TIME SINGLE-CHANNEL SPEECH DEREVERBERATION ALGORITHMS
    Xiong, Feifei
    Meyer, Bernd T.
    Cauchi, Benjamin
    Jukic, Ante
    Doclo, Simon
    Goetze, Stefan
    [J]. 2017 HANDS-FREE SPEECH COMMUNICATIONS AND MICROPHONE ARRAYS (HSCMA 2017), 2017, : 126 - 130
  • [5] Time-domain adaptive attention network for single-channel speech separation
    Wang, Kunpeng
    Zhou, Hao
    Cai, Jingxiang
    Li, Wenna
    Yao, Juan
    [J]. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2023, 2023 (01)
  • [6] Time-domain adaptive attention network for single-channel speech separation
    Kunpeng Wang
    Hao Zhou
    Jingxiang Cai
    Wenna Li
    Juan Yao
    [J]. EURASIP Journal on Audio, Speech, and Music Processing, 2023
  • [7] Deep Clustering in Complex Domain for Single-Channel Speech Separation
    Liu, Runling
    Tang, Yu
    Mang, Hongwei
    [J]. 2022 IEEE 17TH CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA), 2022, : 1463 - 1468
  • [8] Real-time single-channel speech enhancement based on causal attention mechanism
    Fan, Junyi
    Yang, Jibin
    Zhang, Xiongwei
    Yao, Yao
    [J]. APPLIED ACOUSTICS, 2022, 201
  • [9] Unsupervised Single-Channel Speech Separation via Deep Neural Network for Different Gender Mixtures
    Wang, Yannan
    Du, Jun
    Dai, Li-Rong
    Lee, Chin-Hui
    [J]. 2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2016,
  • [10] Deep Neural Network for Supervised Single-Channel Speech Enhancement
    Saleem, Nasir
    Irfan Khattak, Muhammad
    Ali, Muhammad Yousaf
    Shafi, Muhammad
    [J]. ARCHIVES OF ACOUSTICS, 2019, 44 (01) : 3 - 12