Multi-channel Speech Enhancement Using Time-Domain Convolutional Denoising Autoencoder

被引:14
|
作者
Tawara, Naohiro [1 ]
Kobayashi, Tetsunori [1 ]
Ogawa, Tetsuji [1 ]
机构
[1] Waseda Univ, Dept Commun & Comp Engn, Tokyo, Japan
来源
关键词
Time-domain denoising autoencoder; dilated convolutional network; multi-channel speech enhancement;
D O I
10.21437/Interspeech.2019-3197
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
This paper investigates the use of time-domain convolutional denoising autoencoders (TCDAEs) with multiple channels as a method of speech enhancement. In general, denoising autoencoders (DAEs), deep learning systems that map noise-corrupted into clean waveforms, have been shown to generate high-quality signals while working in the time domain without the intermediate stage of phase modeling. Convolutional DAEs are one of the popular structures which learns a mapping between noise-corrupted and clean waveforms with convolutional denoising autoencoder. Multi-channel signals for TCDAEs are promising because the different times of arrival of a signal can be directly processed with their convolutional structure, Up to this time, TCDAEs have only been applied to single-channel signals. This paper explorers the effectiveness of TCDAEs in a multi-channel configuration. A multi-channel TCDAEs are evaluated on multi-channel speech enhancement experiments, yielding significant improvement over single-channel DAEs in terms of signal-to-distortion ratio, perceptual evaluation of speech quality (PESQ), and word error rate.
引用
收藏
页码:86 / 90
页数:5
相关论文
共 50 条
  • [1] EXPLORING MULTI-CHANNEL FEATURES FOR DENOISING-AUTOENCODER-BASED SPEECH ENHANCEMENT
    Araki, Shoko
    Hayashi, Tomoki
    Delcroix, Marc
    Fujimoto, Masakiyo
    Takeda, Kazuya
    Nakatani, Tomohiro
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 116 - 120
  • [2] CLOSING THE GAP BETWEEN TIME-DOMAIN MULTI-CHANNEL SPEECH ENHANCEMENT ON REAL AND SIMULATION CONDITIONS
    Zhang, Wangyou
    Shi, Jing
    Li, Chenda
    Watanabe, Shinji
    Qian, Yanmin
    [J]. 2021 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2021, : 146 - 150
  • [3] Multi-Channel Time-Domain Boring-Vibration-Enhancement Method Using RNN Networks
    Xu, Xiaolin
    Li, Juhu
    Zhang, Huarong
    [J]. INSECTS, 2023, 14 (10)
  • [4] Group Multi-Scale convolutional Network for Monaural Speech Enhancement in Time-domain
    Yu, Juntao
    Jiang, Ting
    Yu, Jiacheng
    [J]. 2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 646 - 650
  • [5] MIMO Speech Compression and Enhancement Based on Convolutional Denoising Autoencoder
    Li, You-Jin
    Wang, Syu-Siang
    Tsao, Yu
    Su, Borching
    [J]. 2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 1245 - 1250
  • [6] Multi-channel speech enhancement using early and late fusion convolutional neural networks
    Priyanka, S. Siva
    Kumar, T. Kishore
    [J]. SIGNAL IMAGE AND VIDEO PROCESSING, 2023, 17 (04) : 973 - 979
  • [7] Multi-channel speech enhancement using early and late fusion convolutional neural networks
    S. Siva Priyanka
    T. Kishore Kumar
    [J]. Signal, Image and Video Processing, 2023, 17 : 973 - 979
  • [8] Development of a multi-channel time-domain fluorescence mammograph
    Hagen, A.
    Steinkellner, O.
    Grosenick, D.
    Moeller, M.
    Ziegler, R.
    Nielsen, T.
    Lauritsen, K.
    Macdonald, R.
    Rinneberg, H.
    [J]. OPTICAL TOMOGRAPHY AND SPECTROSCOPY OF TISSUE VII, 2007, 6434
  • [9] A Complex Neural Network Adaptive Beamforming for Multi-channel Speech Enhancement in Time Domain
    Jiang, Tao
    Liu, Hongqing
    Zhou, Yi
    Gan, Lu
    [J]. COMMUNICATIONS AND NETWORKING (CHINACOM 2021), 2022, : 129 - 139
  • [10] MULTI-CHANNEL SPEECH DENOISING FOR MACHINE EARS
    Han, Cong
    Kaya, E. Merve
    Hoefer, Kyle
    Slaney, Malcolm
    Carlile, Simon
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 276 - 280