MULTI-SCALE TEMPORAL FREQUENCY CONVOLUTIONAL NETWORK WITH AXIAL ATTENTION FOR MULTI-CHANNEL SPEECH ENHANCEMENT

被引:10
|
作者
Zhang, Guochang [1 ]
Wang, Chunliang [1 ]
Yu, Libiao [1 ]
Wei, Jianqiang [1 ]
机构
[1] Baidu Inc, Dept Speech Technol, Beijing 100085, Peoples R China
关键词
speech dense-prediction; speech enhancement; multi-scale; axial self-attention;
D O I
10.1109/ICASSP43922.2022.9746902
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech quality is often degraded by background noise and reverberation. Usually, a dense prediction network is used to reconstruct clean speech. In this work, a novel backbone for speech dense-prediction is proposed. After adjusting part of the input and output, this backbone is used for multi-channel speech enhancement task in this paper. To improve the performance of the backbone, strategies such as multi-channel phase encoder, multi-scale temporal frequency processing, axial self-attention, and two-stage masking are designed. Our proposed method is evaluated based on the datasets of ICASSP 2022 L3DAS22 Challenge. The experimental results show that the proposed method outperforms previous state-of-the-art baselines by a large margin(1) and ranked second in L3DAS22 Challenge. The proposed backbone is also used for mono-channel speech enhancement and ranked first in both ICASSP 2022 AEC2 and DNS Challenges(non-personal track)(3).
引用
收藏
页码:9206 / 9210
页数:5
相关论文
共 50 条
  • [1] MULTI-SCALE TEMPORAL FREQUENCY CONVOLUTIONAL NETWORK WITH AXIAL ATTENTION FOR SPEECH ENHANCEMENT
    Zhang, Guochang
    Yu, Libiao
    Wang, Chunliang
    Wei, Jianqiang
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 9122 - 9126
  • [2] Supervised Attention Multi-Scale Temporal Convolutional Network for monaural speech enhancement
    Zhang, Zehua
    Zhang, Lu
    Zhuang, Xuyi
    Qian, Yukun
    Wang, Mingjiang
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2024, 2024 (01)
  • [3] A Convolutional Network With Multi-Scale and Attention Mechanisms for End-to-End Single-Channel Speech Enhancement
    Xiang, Xiaoxiao
    Zhang, Xiaojuan
    Chen, Haozhe
    IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 1455 - 1459
  • [4] A Multi-Channel and Multi-Scale Convolutional Neural Network for Hand Posture Recognition
    Feng, Jiawen
    Zhang, Limin
    Deng, Xiangyang
    Yu, Zhijun
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, PT II, 2017, 10614 : 785 - 785
  • [5] Two-stage UNet with channel and temporal-frequency attention for multi-channel speech enhancement
    Xu, Shiyun
    Cao, Yinghan
    Zhang, Zehua
    Wang, Mingjiang
    SPEECH COMMUNICATION, 2025, 166
  • [6] DEEP COMPLEX CONVOLUTIONAL RECURRENT NETWORK FOR MULTI-CHANNEL SPEECH ENHANCEMENT AND DEREVERBERATION
    Gelderblom, Femke B.
    Myrvoll, Tor Andre
    2021 IEEE 31ST INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2021,
  • [7] Multi-channel and multi-scale separable dilated convolutional neural network with attention mechanism for flue-cured tobacco classification
    Xu, Ming
    Gao, Jinfeng
    Zhang, Zhong
    Guo, Xin
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (21): : 15511 - 15529
  • [8] Multi-channel and multi-scale separable dilated convolutional neural network with attention mechanism for flue-cured tobacco classification
    Ming Xu
    Jinfeng Gao
    Zhong Zhang
    Xin Guo
    Neural Computing and Applications, 2023, 35 : 15511 - 15529
  • [9] Multi-Scale Spatial Attention-Based Multi-Channel 2D Convolutional Network for Soil Property Prediction
    Feng, Guolun
    Li, Zhiyong
    Zhang, Junbo
    Wang, Mantao
    SENSORS, 2024, 24 (14)
  • [10] DConvT: Deep Convolution-Transformer Network Utilizing Multi-scale Temporal Attention for Speech Enhancement
    Hoang Ngoc Chau
    Anh Xuan Tran Thi
    Quoc Cuong Nguyen
    2024 IEEE TENTH INTERNATIONAL CONFERENCE ON COMMUNICATIONS AND ELECTRONICS, ICCE 2024, 2024, : 398 - 402