MULTI-SCALE TEMPORAL FREQUENCY CONVOLUTIONAL NETWORK WITH AXIAL ATTENTION FOR SPEECH ENHANCEMENT

被引:22
|
作者
Zhang, Guochang [1 ]
Yu, Libiao [1 ]
Wang, Chunliang [1 ]
Wei, Jianqiang [1 ]
机构
[1] Baidu Inc, Dept Speech Technol, Beijing 100085, Peoples R China
关键词
speech dense-prediction; speech enhancement; multi-scale; axial attention;
D O I
10.1109/ICASSP43922.2022.9746610
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech quality is often degraded by acoustic echoes, background noise, and reverberation. In this paper, we propose a system consisting of deep learning and signal processing to simultaneously suppress echoes, noise, and reverberation. For the deep learning, we design a novel speech dense-prediction backbone. For the signal processing, a linear acoustic echo canceller is used as conditional information for deep learning. To improve the performance of the speech dense-prediction backbone, strategies such as a microphone and reference phase encoder, multi-scale time-frequency processing, and streaming axial attention are designed. The proposed system ranked first in both AEC and DNS Challenge (non-personal track) of ICASSP 2022. In addition, this backbone has also been extended to the multi-channel speech enhancement task, and placed second in ICASSP 2022 L3DAS22 Challenge(1).
引用
收藏
页码:9122 / 9126
页数:5
相关论文
共 50 条
  • [1] MULTI-SCALE TEMPORAL FREQUENCY CONVOLUTIONAL NETWORK WITH AXIAL ATTENTION FOR MULTI-CHANNEL SPEECH ENHANCEMENT
    Zhang, Guochang
    Wang, Chunliang
    Yu, Libiao
    Wei, Jianqiang
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 9206 - 9210
  • [2] Supervised Attention Multi-Scale Temporal Convolutional Network for monaural speech enhancement
    Zhang, Zehua
    Zhang, Lu
    Zhuang, Xuyi
    Qian, Yukun
    Wang, Mingjiang
    [J]. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2024, 2024 (01)
  • [3] Temporal Convolutional Network with Frequency Dimension Adaptive Attention for Speech Enhancement
    Zhang, Qiquan
    Song, Qi
    Nicolson, Aaron
    Lan, Tian
    Li, Haizhou
    [J]. INTERSPEECH 2021, 2021, : 166 - 170
  • [4] A Convolutional Network With Multi-Scale and Attention Mechanisms for End-to-End Single-Channel Speech Enhancement
    Xiang, Xiaoxiao
    Zhang, Xiaojuan
    Chen, Haozhe
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 1455 - 1459
  • [5] Group Multi-Scale convolutional Network for Monaural Speech Enhancement in Time-domain
    Yu, Juntao
    Jiang, Ting
    Yu, Jiacheng
    [J]. 2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 646 - 650
  • [6] An attention-based multi-scale temporal convolutional network for remaining useful life prediction
    Xu, Zhiqiang
    Zhang, Yujie
    Miao, Qiang
    [J]. RELIABILITY ENGINEERING & SYSTEM SAFETY, 2024, 250
  • [7] Underwater Image Enhancement with Multi-Scale Residual Attention Network
    Ueki, Yosuke
    Ikehara, Masaaki
    [J]. 2021 INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2021,
  • [8] Multi-scale network with attention mechanism for underwater image enhancement
    Tao, Ye
    Tang, Jinhui
    Zhao, Xinwei
    Zhou, Chen
    Wang, Chong
    Zhao, Zhonglei
    [J]. NEUROCOMPUTING, 2024, 595
  • [9] Multi-scale informative perceptual network for monaural speech enhancement
    Lan, Tian
    Li, Jiajia
    Feng, Yujia
    Tai, Wenxin
    Wang, Yixiang
    Chen, Cong
    Kang, Jun
    Liu, Qiao
    [J]. APPLIED ACOUSTICS, 2022, 195
  • [10] A Multi-scale Convolutional Attention Based GRU Network for Text Classification
    Tang, Xianlun
    Chen, Yingjie
    Dai, Yuyan
    Xu, Jin
    Peng, Deguang
    [J]. 2019 CHINESE AUTOMATION CONGRESS (CAC2019), 2019, : 3009 - 3013