Time-domain adaptive attention network for single-channel speech separation

被引:2
|
作者
Wang, Kunpeng [1 ]
Zhou, Hao [1 ]
Cai, Jingxiang [1 ]
Li, Wenna [1 ]
Yao, Juan [1 ,2 ]
机构
[1] Southwest Univ Sci & Technol, Sch Informat Engn, Mianyang, Peoples R China
[2] Univ Sci & Technol China, Dept Automat, Hefei, Peoples R China
关键词
Speech separation; Adaptive attention; Convolutional block attention; Transformer; NOISY;
D O I
10.1186/s13636-023-00283-w
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Recent years have witnessed a great progress in single-channel speech separation by applying self-attention based networks. Despite the excellent performance in mining relevant long-sequence contextual information, self-attention networks cannot perfectly focus on subtle details in speech signals, such as temporal or spectral continuity, spectral structure, and timbre. To tackle this problem, we proposed a time-domain adaptive attention network (TAANet) with local and global attention network. Channel and spatial attention are introduced in local attention networks to focus on subtle details of the speech signals (frame-level features). In the global attention networks, a self-attention mechanism is used to explore the global associations of the speech contexts (utterance-level features). Moreover, we model the speech signal serially using multiple local and global attention blocks. This cascade structure enables our model to focus on local and global features adaptively, compared with other speech separation feature extraction methods, further boosting the separation performance. Versus other end-to-end speech separation methods, extensive experiments on benchmark datasets demonstrate that our approach obtains a superior result. (20.7 dB of SI-SNRi and 20.9 dB of SDRi on WSJ0-2mix).
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Time-domain adaptive attention network for single-channel speech separation
    Kunpeng Wang
    Hao Zhou
    Jingxiang Cai
    Wenna Li
    Juan Yao
    [J]. EURASIP Journal on Audio, Speech, and Music Processing, 2023
  • [2] TASNET: TIME-DOMAIN AUDIO SEPARATION NETWORK FOR REAL-TIME, SINGLE-CHANNEL SPEECH SEPARATION
    Luo, Yi
    Mesgarani, Nima
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 696 - 700
  • [3] Real-time Single-channel Dereverberation and Separation with Time-domain Audio Separation Network
    Luo, Yi
    Mesgarani, Nima
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 342 - 346
  • [4] IMPROVING NOISE ROBUST AUTOMATIC SPEECH RECOGNITION WITH SINGLE-CHANNEL TIME-DOMAIN ENHANCEMENT NETWORK
    Kinoshita, Keisuke
    Ochiai, Tsubasa
    Delcroix, Marc
    Nakatani, Tomohiro
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7009 - 7013
  • [5] Single-channel signal separation using time-domain basis functions
    Jang, GJ
    Lee, TW
    Oh, YH
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2003, 10 (06) : 168 - 171
  • [6] An Efficient Time-Domain End-to-End Single-Channel Bird Sound Separation Network
    Zhang, Chengyun
    Chen, Yonghuan
    Hao, Zezhou
    Gao, Xinghui
    [J]. ANIMALS, 2022, 12 (22):
  • [7] Single-Channel Speech Separation Focusing on Attention DE
    Li, Xinshu
    Tan, Zhenhua
    Xia, Zhenche
    Wu, Danke
    Zhang, Bin
    [J]. 2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 3204 - 3209
  • [8] Single-channel deep time-domain speech enhancement networks for cabin environments
    Zhang, Lin
    Wang, Haitao
    Yang, Shuang
    Zeng, Xiangyang
    Chen, Ke'an
    [J]. Shengxue Xuebao/Acta Acustica, 2023, 48 (04): : 890 - 900
  • [9] DUAL-PATH RNN: EFFICIENT LONG SEQUENCE MODELING FOR TIME-DOMAIN SINGLE-CHANNEL SPEECH SEPARATION
    Luo, Yi
    Ghen, Zhuo
    Yoshioka, Takuya
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 46 - 50
  • [10] Enhancement of Single-Channel Periodic Signals in the Time-Domain
    Jensen, Jesper Rindom
    Benesty, Jacob
    Christensen, Mads Graesboll
    Jensen, Soren Holdt
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (07): : 1948 - 1963