A Two-Stage Beamforming and Diffusion-Based Refiner System for 3D Speech Enhancement

被引:0
|
作者
Chen, Feilong [1 ]
Lin, Wenmo [1 ]
Sun, Chengli [1 ]
Guo, Qiaosheng [2 ]
机构
[1] Nanchang Hangkong Univ, Sch Informat Engn, Nanchang 330063, Peoples R China
[2] Chaoyang Jushengtai Xinfeng Technol Co Ltd, Ganzhou 341001, Peoples R China
关键词
Speech enhancement; 3D speech signal; Diffusion model; Beamforming; Multi-channel;
D O I
10.1007/s00034-024-02652-y
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Speech enhancement in 3D reverberant environments is a challenging and significant problem for many downstream applications, such as speech recognition, speaker identification, and audio analysis. Existing deep neural network models have shown efficacy for 3D speech enhancement tasks, but they often introduce distortions or unnatural artifacts in the enhanced speech. In this work, we propose a novel two-stage refiner system that integrates a neural beamforming network and a diffusion model for robust 3D speech enhancement. The neural beamforming network performs spatial filtering to suppress the noise and reverberation; while, the diffusion model leverages its generative capability to restore the missing or distorted speech components from the beamformed output. To the best of our knowledge, this is the first work that applies the diffusion model as a backend refiner to 3D speech enhancement. We investigate the effect of training the diffusion model with either enhanced speech or clean speech, and find that clean speech can better capture the prior knowledge of speech components and improve the speech recovery. We evaluate our proposed system on different datasets and beamformer architectures, and show that it achieves consistent improvements in metrics like WER and NISQA, indicating that the diffusion model has strong generalization ability and can serve as a backend refinement module for 3D speech enhancement, regardless of the front-end beamforming network. Our work demonstrates the effectiveness of integrating discriminative and generative models for robust 3D speech enhancement, and also opens up a new direction for applying generative diffusion models to 3D speech processing tasks, which can be used as a backend to various beamforming enhancement methods.
引用
收藏
页码:4369 / 4389
页数:21
相关论文
共 50 条
  • [1] 3D Speech Enhancement Algorithm for Two-Stage U-Net Beamforming Network
    Lin, Wenmo
    Chen, Feilong
    Sun, Chengli
    Zhu, Zhenjun
    Computer Engineering and Applications, 2023, 59 (22) : 128 - 135
  • [2] Domain selective two-stage beamforming in 3D massive MIMO
    Gao, Tianbao
    Liu, Chen
    Song, Yunchao
    Cheng, Nan
    Qian, Mujun
    Zhang, Ran
    DIGITAL SIGNAL PROCESSING, 2022, 130
  • [3] Diffusion-Based 3D Bioprinting Strategies
    Cai, Betty
    Kilian, David
    Mejia, Daniel Ramos
    Rios, Ricardo J.
    Ali, Ashal
    Heilshorn, Sarah C.
    ADVANCED SCIENCE, 2024, 11 (08)
  • [4] Speech Enhancement and Dereverberation With Diffusion-Based Generative Models
    Richter, Julius
    Welker, Simon
    Lemercier, Jean-Marie
    Lay, Bunlong
    Gerkmann, Timo
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 2351 - 2364
  • [5] A two-stage algorithm for enhancement of reverberant speech
    Wu, MY
    Wang, D
    2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 1085 - 1088
  • [6] Diffusion-based Generation, Optimization, and Planning in 3D Scenes
    Huang, Siyuan
    Wang, Zan
    Li, Puhao
    Jia, Baoxiong
    Liu, Tengyu
    Zhu, Yixin
    Liang, Wei
    Zhu, Song-Chun
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 16750 - 16761
  • [7] Diffusion-Based 3D Object Detection with Random Boxes
    Zhou, Xin
    Hou, Jinghua
    Yao, Tingting
    Liang, Dingkang
    Liu, Zhe
    Zou, Zhikang
    Ye, Xiaoqing
    Cheng, Jianwei
    Bai, Xiang
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT II, 2024, 14426 : 28 - 40
  • [8] DiffComplete: Diffusion-based Generative 3D Shape Completion
    Chu, Ruihang
    Xie, Enze
    Mo, Shentong
    Li, Zhenguo
    Niessner, Matthias
    Fu, Chi-Wing
    Jia, Jiaya
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [9] A TWO-STAGE ALGORITHM FOR NOISY AND REVERBERANT SPEECH ENHANCEMENT
    Zhao, Yan
    Wang, Zhong-Qiu
    Wang, DeLiang
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5580 - 5584
  • [10] TWO-STAGE SPEECH ENHANCEMENT USING GATED CONVOLUTIONS
    Thieling, Lars
    Jax, Peter
    2022 INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC 2022), 2022,