MULTI-SCALE TEMPORAL FREQUENCY CONVOLUTIONAL NETWORK WITH AXIAL ATTENTION FOR MULTI-CHANNEL SPEECH ENHANCEMENT

被引:10
|
作者
Zhang, Guochang [1 ]
Wang, Chunliang [1 ]
Yu, Libiao [1 ]
Wei, Jianqiang [1 ]
机构
[1] Baidu Inc, Dept Speech Technol, Beijing 100085, Peoples R China
关键词
speech dense-prediction; speech enhancement; multi-scale; axial self-attention;
D O I
10.1109/ICASSP43922.2022.9746902
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech quality is often degraded by background noise and reverberation. Usually, a dense prediction network is used to reconstruct clean speech. In this work, a novel backbone for speech dense-prediction is proposed. After adjusting part of the input and output, this backbone is used for multi-channel speech enhancement task in this paper. To improve the performance of the backbone, strategies such as multi-channel phase encoder, multi-scale temporal frequency processing, axial self-attention, and two-stage masking are designed. Our proposed method is evaluated based on the datasets of ICASSP 2022 L3DAS22 Challenge. The experimental results show that the proposed method outperforms previous state-of-the-art baselines by a large margin(1) and ranked second in L3DAS22 Challenge. The proposed backbone is also used for mono-channel speech enhancement and ranked first in both ICASSP 2022 AEC2 and DNS Challenges(non-personal track)(3).
引用
收藏
页码:9206 / 9210
页数:5
相关论文
共 50 条
  • [11] A Feature Integration Network for Multi-Channel Speech Enhancement
    Zeng, Xiao
    Zhang, Xue
    Wang, Mingjiang
    [J]. Sensors, 2024, 24 (22)
  • [12] Group Multi-Scale convolutional Network for Monaural Speech Enhancement in Time-domain
    Yu, Juntao
    Jiang, Ting
    Yu, Jiacheng
    [J]. 2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 646 - 650
  • [13] Temporal Convolutional Network with Frequency Dimension Adaptive Attention for Speech Enhancement
    Zhang, Qiquan
    Song, Qi
    Nicolson, Aaron
    Lan, Tian
    Li, Haizhou
    [J]. INTERSPEECH 2021, 2021, : 166 - 170
  • [14] FB-MSTCN: A FULL-BAND SINGLE-CHANNEL SPEECH ENHANCEMENT METHOD BASED ON MULTI-SCALE TEMPORAL CONVOLUTIONAL NETWORK
    Zhang, Zehua
    Zhang, Lu
    Zhuang, Xuyi
    Qian, Yukun
    Li, Heng
    Wang, Mingjiang
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 9276 - 9280
  • [15] Multi-scale convolutional network with channel attention mechanism for rolling bearing fault diagnosis
    Huang, Ya-Jing
    Liao, Ai-Hua
    Hu, Ding-Yu
    Shi, Wei
    Zheng, Shu-Bin
    [J]. MEASUREMENT, 2022, 203
  • [16] Multi-objective based multi-channel speech enhancement with BiLSTM network
    Cui, Xingyue
    Chen, Zhe
    Yin, Fuliang
    [J]. APPLIED ACOUSTICS, 2021, 177
  • [17] A Multi-Scale Channel Attention Network for Prostate Segmentation
    Ding, Meiwen
    Lin, Zhiping
    Lee, Chau Hung
    Tan, Cher Heng
    Huang, Weimin
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2023, 70 (05) : 1754 - 1758
  • [18] Multi-scale convolutional attention network for radar behavior recognition
    Xiong J.
    Pan J.
    Bi D.
    Du M.
    [J]. Xi'an Dianzi Keji Daxue Xuebao/Journal of Xidian University, 2023, 50 (06): : 62 - 74
  • [19] A Multi-scale Subconvolutional U-Net with Time-Frequency Attention Mechanism for Single Channel Speech Enhancement
    Yechuri, Sivaramakrishna
    Komati, Thirupathi Rao
    Yellapragada, Rama Krishna
    Vanambathina, Sunnydaya
    [J]. CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2024, 43 (09) : 5682 - 5710
  • [20] Retinal artery/vein classification by multi-channel multi-scale fusion network
    Junyan Yi
    Chouyu Chen
    Gang Yang
    [J]. Applied Intelligence, 2023, 53 : 26400 - 26417