Convolutional fusion network for monaural speech enhancement

被引:17
|
作者
Xian, Yang [1 ,2 ]
Sun, Yang [3 ]
Wang, Wenwu [4 ]
Naqvi, Syed Mohsen [1 ]
机构
[1] Newcastle Univ, Sch Engn, Intelligent Sensing & Commun Res Grp, Newcastle Upon Tyne NE1 7RU, Tyne & Wear, England
[2] ZhengZhou Univ Light Ind, Coll Comp & Commun Engn, Zhengzhou, Peoples R China
[3] Univ Oxford, Big Data Inst, Oxford OX3 7LF, England
[4] Univ Surrey, Ctr Vis Speech & Signal Proc, Dept Elect & Elect Engn, Surrey GU2 7XH, England
关键词
Convolutional neural network; Model capacity; Shuffle; Group convolutional fusion unit; Depth-wise separable convolution; Intra skip connection; SOURCE SEPARATION; NEURAL-NETWORKS; CLASSIFICATION;
D O I
10.1016/j.neunet.2021.05.017
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Convolutional neural network (CNN) based methods, such as the convolutional encoder-decoder network, offer state-of-the-art results in monaural speech enhancement. In the conventional encoder-decoder network, large kernel size is often used to enhance the model capacity, which, however, results in low parameter efficiency. This could be addressed by using group convolution, as in AlexNet, where group convolutions are performed in parallel in each layer, before their outputs are concatenated. However, with the simple concatenation, the inter-channel dependency information may be lost. To address this, the Shuffle network re-arranges the outputs of each group before concatenating them, by taking part of the whole input sequence as the input to each group of convolution. In this work, we propose a new convolutional fusion network (CFN) for monaural speech enhancement by improving model performance, inter-channel dependency, information reuse and parameter efficiency. First, a new group convolutional fusion unit (GCFU) consisting of the standard and depth-wise separable CNN is used to reconstruct the signal. Second, the whole input sequence (full information) is fed simultaneously to two convolution networks in parallel, and their outputs are re-arranged (shuffled) and then concatenated, in order to exploit the inter-channel dependency within the network. Third, the intra skip connection mechanism is used to connect different layers inside the encoder as well as decoder to further improve the model performance. Extensive experiments are performed to show the improved performance of the proposed method as compared with three recent baseline methods. (C) 2021 Elsevier Ltd. All rights reserved.
引用
收藏
页码:97 / 107
页数:11
相关论文
共 50 条
  • [1] Dilated convolutional recurrent neural network for monaural speech enhancement
    Pirhosseinloo, Shadi
    Brumberg, Jonathan S.
    CONFERENCE RECORD OF THE 2019 FIFTY-THIRD ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, 2019, : 158 - 162
  • [2] REDUNDANT CONVOLUTIONAL NETWORK WITH ATTENTION MECHANISM FOR MONAURAL SPEECH ENHANCEMENT
    Lan, Tian
    Lyu, Yilan
    Hui, Guoqiang
    Mokhosi, Refuoe
    Li, Sen
    Liu, Qiao
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6654 - 6658
  • [3] COMPLEX SPECTRAL MAPPING WITH A CONVOLUTIONAL RECURRENT NETWORK FOR MONAURAL SPEECH ENHANCEMENT
    Tan, Ke
    Wang, DeLiang
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6865 - 6869
  • [4] An Attention-augmented Fully Convolutional Neural Network for Monaural Speech Enhancement
    Xu, Zezheng
    Jiang, Ting
    Li, Chao
    Yu, Jiacheng
    2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,
  • [5] Low-Power Convolutional Recurrent Neural Network For Monaural Speech Enhancement
    Gao, Fei
    Guan, Haixin
    2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 559 - 563
  • [6] Supervised Attention Multi-Scale Temporal Convolutional Network for monaural speech enhancement
    Zhang, Zehua
    Zhang, Lu
    Zhuang, Xuyi
    Qian, Yukun
    Wang, Mingjiang
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2024, 2024 (01)
  • [7] Deep Attractor with Convolutional Network for Monaural Speech Separation
    Lan, Tian
    Qian, Yuxin
    Tai, Wenxin
    Chu, Boce
    Liu, Qiao
    2020 11TH IEEE ANNUAL UBIQUITOUS COMPUTING, ELECTRONICS & MOBILE COMMUNICATION CONFERENCE (UEMCON), 2020, : 40 - 44
  • [8] PFRNet: Dual-Branch Progressive Fusion Rectification Network for Monaural Speech Enhancement
    Yu, Runxiang
    Zhao, Ziwei
    Ye, Zhongfu
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 2358 - 2362
  • [9] Group Multi-Scale convolutional Network for Monaural Speech Enhancement in Time-domain
    Yu, Juntao
    Jiang, Ting
    Yu, Jiacheng
    2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 646 - 650
  • [10] SpecMNet: Spectrum mend network for monaural speech enhancement
    Fan, Cunhang
    Zhang, Hongmei
    Yi, Jiangyan
    Lv, Zhao
    Tao, Jianhua
    Li, Taihao
    Pei, Guanxiong
    Wu, Xiaopei
    Li, Sheng
    APPLIED ACOUSTICS, 2022, 194