On phase recovery and preserving early reflections for deep-learning speech dereverberation

被引:1
|
作者
Luo, Xiaoxue [1 ,2 ]
Ke, Yuxuan [1 ,2 ]
Li, Xiaodong [1 ,2 ]
Zheng, Chengshi [1 ,2 ]
机构
[1] Chinese Acad Sci, Inst Acoust, Key Lab Noise & Vibrat Res, Beijing 100190, Peoples R China
[2] Univ Chinese Acad Sci, Beijing 100049, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
SEPARATION; NETWORKS; INTELLIGIBILITY; REVERBERATION; ALGORITHM; MASKING;
D O I
10.1121/10.0024348
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In indoor environments, reverberation often distorts clean speech. Although deep learning-based speech dereverberation approaches have shown much better performance than traditional ones, the inferior speech quality of the dereverberated speech caused by magnitude distortion and limited phase recovery is still a serious problem for practical applications. This paper improves the performance of deep learning-based speech dereverberation from the perspectives of both network design and mapping target optimization. Specifically, on the one hand, a bifurcated-and-fusion network and its guidance loss functions were designed to help reduce the magnitude distortion while enhancing the phase recovery. On the other hand, the time boundary between the early and late reflections in the mapped speech was investigated, so as to make a balance between the reverberation tailing effect and the difficulty of magnitude/phase recovery. Mathematical derivations were provided to show the rationality of the specially designed loss functions. Geometric illustrations were given to explain the importance of preserving early reflections in reducing the difficulty of phase recovery. Ablation study results confirmed the validity of the proposed network topology and the importance of preserving 20 ms early reflections in the mapped speech. Objective and subjective test results showed that the proposed system outperformed other baselines in the speech dereverberation task.
引用
收藏
页码:436 / 451
页数:16
相关论文
共 50 条
  • [21] A dual-stream deep attractor network with multi-domain learning for speech dereverberation and separation
    Chen, Hangting
    Zhang, Pengyuan
    NEURAL NETWORKS, 2021, 141 : 238 - 248
  • [22] Data Reduction and Deep-Learning Based Recovery for Geospatial Visualization and Satellite Imagery
    Tasnim, Jarin
    Mondal, Debajyoti
    2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 5276 - 5285
  • [23] An interpretable deep-learning model for early prediction of sepsis in the emergency department
    Zhang, Dongdong
    Yin, Changchang
    Hunold, Katherine M.
    Jiang, Xiaoqian
    Caterino, Jeffrey M.
    Zhang, Ping
    PATTERNS, 2021, 2 (02):
  • [24] Deep-Learning Phase-Onset Picker for Deep Earth Seismology: PKIKP Waves
    Zhou, Jiarun
    Pham, Thanh-Son
    Tkalcic, Hrvoje
    JOURNAL OF GEOPHYSICAL RESEARCH-SOLID EARTH, 2024, 129 (09)
  • [25] Deep-learning prediction of amyloid deposition from early-phase amyloid positron emission tomography imaging
    Seisaku Komori
    Donna J. Cross
    Megan Mills
    Yasuomi Ouchi
    Sadahiko Nishizawa
    Hiroyuki Okada
    Takashi Norikane
    Tanyaluck Thientunyakit
    Yoshimi Anzai
    Satoshi Minoshima
    Annals of Nuclear Medicine, 2022, 36 : 913 - 921
  • [26] Deep-learning prediction of amyloid deposition from early-phase amyloid positron emission tomography imaging
    Komori, Seisaku
    Cross, Donna J.
    Mills, Megan
    Ouchi, Yasuomi
    Nishizawa, Sadahiko
    Okada, Hiroyuki
    Norikane, Takashi
    Thientunyakit, Tanyaluck
    Anzai, Yoshimi
    Minoshima, Satoshi
    ANNALS OF NUCLEAR MEDICINE, 2022, 36 (10) : 913 - 921
  • [27] PhaseGAN: a deep-learning phase-retrieval approach for unpaired datasets
    Zhang, Yuhe
    Noack, Mike Andreas
    Vagovic, Patrik
    Fezzaa, Kamel
    Garcia-Moreno, Francisco
    Ritschel, Tobias
    Villanueva-Perez, Pablo
    OPTICS EXPRESS, 2021, 29 (13) : 19593 - 19604
  • [28] Speech Processing for Digital Home Assistants: Combining signal processing with deep-learning techniques
    Haeb-Umbach, Reinhold
    Watanabe, Shinji
    Nakatani, Tomohiro
    Bacchiani, Michiel
    Hoffmeister, Bjoern
    Seltzer, Michael L.
    Zen, Heiga
    Souden, Mehrez
    IEEE SIGNAL PROCESSING MAGAZINE, 2019, 36 (06) : 111 - 124
  • [29] A Hybrid Deep-Learning Approach for Single Channel HF-SSB Speech Enhancement
    Chen, Yantao
    Dong, Binhong
    Zhang, Xiaoxue
    Gao, Pengyu
    Li, Shaoqian
    IEEE WIRELESS COMMUNICATIONS LETTERS, 2021, 10 (10) : 2165 - 2169
  • [30] Enhanced Deep-Learning Method for Marine Gravity Recovery From Altimetry and Bathymetry Data
    Qiu, Licheng
    Zhu, Chengcheng
    Guo, Jinyun
    Yang, Lei
    Li, Wanqiu
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21 : 1 - 5