MULTI-SCALE TEMPORAL FREQUENCY CONVOLUTIONAL NETWORK WITH AXIAL ATTENTION FOR MULTI-CHANNEL SPEECH ENHANCEMENT

被引:10
|
作者
Zhang, Guochang [1 ]
Wang, Chunliang [1 ]
Yu, Libiao [1 ]
Wei, Jianqiang [1 ]
机构
[1] Baidu Inc, Dept Speech Technol, Beijing 100085, Peoples R China
关键词
speech dense-prediction; speech enhancement; multi-scale; axial self-attention;
D O I
10.1109/ICASSP43922.2022.9746902
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech quality is often degraded by background noise and reverberation. Usually, a dense prediction network is used to reconstruct clean speech. In this work, a novel backbone for speech dense-prediction is proposed. After adjusting part of the input and output, this backbone is used for multi-channel speech enhancement task in this paper. To improve the performance of the backbone, strategies such as multi-channel phase encoder, multi-scale temporal frequency processing, axial self-attention, and two-stage masking are designed. Our proposed method is evaluated based on the datasets of ICASSP 2022 L3DAS22 Challenge. The experimental results show that the proposed method outperforms previous state-of-the-art baselines by a large margin(1) and ranked second in L3DAS22 Challenge. The proposed backbone is also used for mono-channel speech enhancement and ranked first in both ICASSP 2022 AEC2 and DNS Challenges(non-personal track)(3).
引用
收藏
页码:9206 / 9210
页数:5
相关论文
共 50 条
  • [1] MULTI-SCALE TEMPORAL FREQUENCY CONVOLUTIONAL NETWORK WITH AXIAL ATTENTION FOR SPEECH ENHANCEMENT
    Zhang, Guochang
    Yu, Libiao
    Wang, Chunliang
    Wei, Jianqiang
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 9122 - 9126
  • [2] Supervised Attention Multi-Scale Temporal Convolutional Network for monaural speech enhancement
    Zhang, Zehua
    Zhang, Lu
    Zhuang, Xuyi
    Qian, Yukun
    Wang, Mingjiang
    [J]. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2024, 2024 (01)
  • [3] A Convolutional Network With Multi-Scale and Attention Mechanisms for End-to-End Single-Channel Speech Enhancement
    Xiang, Xiaoxiao
    Zhang, Xiaojuan
    Chen, Haozhe
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 1455 - 1459
  • [4] A Multi-Channel and Multi-Scale Convolutional Neural Network for Hand Posture Recognition
    Feng, Jiawen
    Zhang, Limin
    Deng, Xiangyang
    Yu, Zhijun
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, PT II, 2017, 10614 : 785 - 785
  • [5] DEEP COMPLEX CONVOLUTIONAL RECURRENT NETWORK FOR MULTI-CHANNEL SPEECH ENHANCEMENT AND DEREVERBERATION
    Gelderblom, Femke B.
    Myrvoll, Tor Andre
    [J]. 2021 IEEE 31ST INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2021,
  • [6] Multi-channel and multi-scale separable dilated convolutional neural network with attention mechanism for flue-cured tobacco classification
    Xu, Ming
    Gao, Jinfeng
    Zhang, Zhong
    Guo, Xin
    [J]. NEURAL COMPUTING & APPLICATIONS, 2023, 35 (21): : 15511 - 15529
  • [7] Multi-Scale Spatial Attention-Based Multi-Channel 2D Convolutional Network for Soil Property Prediction
    Feng, Guolun
    Li, Zhiyong
    Zhang, Junbo
    Wang, Mantao
    [J]. SENSORS, 2024, 24 (14)
  • [8] Multi-channel and multi-scale separable dilated convolutional neural network with attention mechanism for flue-cured tobacco classification
    Ming Xu
    Jinfeng Gao
    Zhong Zhang
    Xin Guo
    [J]. Neural Computing and Applications, 2023, 35 : 15511 - 15529
  • [9] A MULTI-CHANNEL TEMPORAL ATTENTION CONVOLUTIONAL NEURAL NETWORK MODEL FOR ENVIRONMENTAL SOUND CLASSIFICATION
    Wang, You
    Feng, Chuyao
    Anderson, David, V
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 930 - 934
  • [10] Group Multi-Scale convolutional Network for Monaural Speech Enhancement in Time-domain
    Yu, Juntao
    Jiang, Ting
    Yu, Jiacheng
    [J]. 2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 646 - 650