Convolutional fusion network for monaural speech enhancement

被引：17

作者：

Xian, Yang ^{[1
,2
]}

Sun, Yang ^{[3
]}

Wang, Wenwu ^{[4
]}

Naqvi, Syed Mohsen ^{[1
]}

机构：

[1] Newcastle Univ, Sch Engn, Intelligent Sensing & Commun Res Grp, Newcastle Upon Tyne NE1 7RU, Tyne & Wear, England

[2] ZhengZhou Univ Light Ind, Coll Comp & Commun Engn, Zhengzhou, Peoples R China

[3] Univ Oxford, Big Data Inst, Oxford OX3 7LF, England

[4] Univ Surrey, Ctr Vis Speech & Signal Proc, Dept Elect & Elect Engn, Surrey GU2 7XH, England

来源：

NEURAL NETWORKS | 2021年 / 143卷

关键词：

Convolutional neural network; Model capacity; Shuffle; Group convolutional fusion unit; Depth-wise separable convolution; Intra skip connection; SOURCE SEPARATION; NEURAL-NETWORKS; CLASSIFICATION;

D O I：

10.1016/j.neunet.2021.05.017

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Convolutional neural network (CNN) based methods, such as the convolutional encoder-decoder network, offer state-of-the-art results in monaural speech enhancement. In the conventional encoder-decoder network, large kernel size is often used to enhance the model capacity, which, however, results in low parameter efficiency. This could be addressed by using group convolution, as in AlexNet, where group convolutions are performed in parallel in each layer, before their outputs are concatenated. However, with the simple concatenation, the inter-channel dependency information may be lost. To address this, the Shuffle network re-arranges the outputs of each group before concatenating them, by taking part of the whole input sequence as the input to each group of convolution. In this work, we propose a new convolutional fusion network (CFN) for monaural speech enhancement by improving model performance, inter-channel dependency, information reuse and parameter efficiency. First, a new group convolutional fusion unit (GCFU) consisting of the standard and depth-wise separable CNN is used to reconstruct the signal. Second, the whole input sequence (full information) is fed simultaneously to two convolution networks in parallel, and their outputs are re-arranged (shuffled) and then concatenated, in order to exploit the inter-channel dependency within the network. Third, the intra skip connection mechanism is used to connect different layers inside the encoder as well as decoder to further improve the model performance. Extensive experiments are performed to show the improved performance of the proposed method as compared with three recent baseline methods. (C) 2021 Elsevier Ltd. All rights reserved.

引用

页码：97 / 107

页数：11

共 50 条

[1] Dilated convolutional recurrent neural network for monaural speech enhancement
Pirhosseinloo, Shadi
Brumberg, Jonathan S.
CONFERENCE RECORD OF THE 2019 FIFTY-THIRD ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, 2019, : 158 - 162
[2] REDUNDANT CONVOLUTIONAL NETWORK WITH ATTENTION MECHANISM FOR MONAURAL SPEECH ENHANCEMENT
Lan, Tian
Lyu, Yilan
Hui, Guoqiang
Mokhosi, Refuoe
Li, Sen
Liu, Qiao
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6654 - 6658
[3] COMPLEX SPECTRAL MAPPING WITH A CONVOLUTIONAL RECURRENT NETWORK FOR MONAURAL SPEECH ENHANCEMENT
Tan, Ke
Wang, DeLiang
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6865 - 6869
[4] An Attention-augmented Fully Convolutional Neural Network for Monaural Speech Enhancement
Xu, Zezheng
Jiang, Ting
Li, Chao
Yu, Jiacheng
2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,
[5] Low-Power Convolutional Recurrent Neural Network For Monaural Speech Enhancement
Gao, Fei
Guan, Haixin
2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 559 - 563
[6] Supervised Attention Multi-Scale Temporal Convolutional Network for monaural speech enhancement
Zhang, Zehua
Zhang, Lu
Zhuang, Xuyi
Qian, Yukun
Wang, Mingjiang
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2024, 2024 (01)
[7] Deep Attractor with Convolutional Network for Monaural Speech Separation
Lan, Tian
Qian, Yuxin
Tai, Wenxin
Chu, Boce
Liu, Qiao
2020 11TH IEEE ANNUAL UBIQUITOUS COMPUTING, ELECTRONICS & MOBILE COMMUNICATION CONFERENCE (UEMCON), 2020, : 40 - 44
[8] PFRNet: Dual-Branch Progressive Fusion Rectification Network for Monaural Speech Enhancement
Yu, Runxiang
Zhao, Ziwei
Ye, Zhongfu
IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 2358 - 2362
[9] Group Multi-Scale convolutional Network for Monaural Speech Enhancement in Time-domain
Yu, Juntao
Jiang, Ting
Yu, Jiacheng
2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 646 - 650
[10] SpecMNet: Spectrum mend network for monaural speech enhancement
Fan, Cunhang
Zhang, Hongmei
Yi, Jiangyan
Lv, Zhao
Tao, Jianhua
Li, Taihao
Pei, Guanxiong
Wu, Xiaopei
Li, Sheng
APPLIED ACOUSTICS, 2022, 194

← 1 2 3 4 5 →