Optimizing Shoulder to Shoulder: A Coordinated Sub-Band Fusion Model for Full-Band Speech Enhancement

被引：3

作者：

Yu, Guochen ^{[1
,2
]}

Li, Andong ^{[2
]}

Liu, Wenzhe ^{[2
]}

Zheng, Chengshi ^{[2
]}

Wang, Yutian ^{[1
]}

Wang, Hui ^{[1
]}

机构：

[1] Commun Univ China, State Key Lab Media Convergence & Commun, Beijing, Peoples R China

[2] Chinese Acad Sci, Inst Acoust, Beijing, Peoples R China

来源：

2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP) | 2022年

关键词：

full-band speech enhancement; sub-bands fusion; dual-stream; decoupling-style concept; multi-stage; NETWORKS;

D O I：

10.1109/ISCSLP57327.2022.10037937

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Due to the high computational complexity to model more frequency bands, it is still intractable to conduct full-band speech enhancement based on deep neural networks. Recent studies typically utilize the compressed perceptually motivated features with relatively low frequency resolution to filter the full-band spectrum by one-stage networks, leading to limited speech quality improvements. In this paper, we propose a coordinated sub-band fusion network for full-band speech enhancement, which aims to recover the low- (0-8 kHz), middle- (8-16 kHz), and high-band (16-24 kHz) in a step-wise manner. Specifically, a dual-stream network is first pretrained to recover the low-band complex spectrum, and another two sub-networks are designed as the middle- and high-band noise suppressors in the magnitude-only domain. To fully capitalize on the information intercommunication, we employ a sub-band interaction module to provide external knowledge guidance across different frequency bands. Extensive experiments show that the proposed method yields consistent performance advantages over state-of-the-art full-band baselines.

引用

页码：483 / 487

页数：5

共 50 条

[41] A Hybrid DSP/Deep Learning Approach to Real-Time Full-Band Speech Enhancement
Valin, Jean-Marc
[J]. 2018 IEEE 20TH INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2018,
[42] ROBUST FULL-BAND ADAPTIVE SINUSOIDAL ANALYSIS AND SYNTHESIS OF SPEECH
Kafentzis, George P.
Rosec, Olivier
Stylianou, Yannis
[J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[43] Independent sub-band functions: Model and applications
Cheng, Xiefeng
Zheng, Yan
Tao, Yewei
Chen, Zhengyu
Chen, Yuehui
[J]. 2007 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-6, 2007, : 361 - +
[44] Reconstruction of missing speech frames using sub-band excitation
Cluver, K
Noll, P
[J]. PROCEEDINGS OF THE IEEE-SP INTERNATIONAL SYMPOSIUM ON TIME-FREQUENCY AND TIME-SCALE ANALYSIS, 1996, : 277 - 280
[45] A Replay Speech Detection Algorithm Based on Sub-band Analysis
Lang Lin
Wang, Rangding
Yan Diqun
[J]. INTELLIGENT INFORMATION PROCESSING IX, 2018, 538 : 337 - 345
[46] Sub-band level Histogram Equalization for Robust Speech Recognition
Joshi, Vikas
Bilgi, Raghavendra
Umesh, S.
Garcia, L.
Benitez, C.
[J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 1672 - +
[47] Maximum likelihood sub-band adaptation for robust speech recognition
Zhu, DL
Nakamura, S
Paliwal, KK
Wang, RH
[J]. SPEECH COMMUNICATION, 2005, 47 (03) : 243 - 264
[48] Sub-band Modulation Spectrum Compensation for Robust Speech Recognition
Tu, Wen-hsiang
Huang, Sheng-Yuan
Hung, Jeih-weih
[J]. 2009 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION & UNDERSTANDING (ASRU 2009), 2009, : 261 - 265
[49] A Hybrid Text-to-Speech Based on Sub-Band Approach
Inoue, Takuma
Hara, Sunao
Abe, Masanobu
[J]. 2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2014,
[50] Mel Sub-Band Filtering and Compression for Robust Speech Recognition
Nasersharif, Babak
Akbari, Ahmad
Homayounpour, Mohammad Mehdi
[J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 105 - +

← 1 2 3 4 5 →