Improving Frame-Online Neural Speech Enhancement With Overlapped-Frame Prediction

被引:2
|
作者
Wang, Zhong-Qiu [1 ]
Watanabe, Shinji [1 ]
机构
[1] Carnegie Mellon Univ, Language Technol Inst, Pittsburgh, PA 15213 USA
关键词
Prediction algorithms; Speech enhancement; Discrete Fourier transforms; Spectrogram; Predictive models; Signal processing algorithms; Time-domain analysis; Deep learning; online speech enhancement; RECURRENT NETWORKS; MASKING;
D O I
10.1109/LSP.2022.3183473
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Frame-online speech enhancement systems in the short-time Fourier transform (STFT) domain usually have an algorithmic latency equal to the window size due to the use of overlap-add in the inverse STFT (iSTFT). This algorithmic latency allows the enhancement models to leverage future contextual information up to a length equal to the window size. However, this information is only partially leveraged by current frame-online systems. To fully exploit it, we propose an overlapped-frame prediction technique for deep learning based frame-online speech enhancement, where at each frame our deep neural network (DNN) predicts the current and several past frames that are necessary for overlap-add, instead of only predicting the current frame. In addition, we propose a loss function to account for the scale difference between predicted and oracle target signals. Experiments on a noisy-reverberant speech enhancement task show the effectiveness of the proposed algorithms.
引用
收藏
页码:1422 / 1426
页数:5
相关论文
共 50 条
  • [1] FRAME-ONLINE DNN-WPE DEREVERBERATION
    Heymann, Jahn
    Drude, Lukas
    Haeb-Umbach, Reinhold
    Kinoshita, Keisuke
    Nakatani, Tomohiro
    [J]. 2018 16TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2018, : 466 - 470
  • [2] DNN-FREE LOW-LATENCY ADAPTIVE SPEECH ENHANCEMENT BASED ON FRAME-ONLINE BEAMFORMING POWERED BY BLOCK-ONLINE FASTMNMF
    Nugraha, Aditya Arie
    Sekiguchi, Kouhei
    Fontaine, Mathieu
    Bando, Yoshiaki
    Yoshii, Kazuyoshi
    [J]. 2022 INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC 2022), 2022,
  • [3] Study on the frame synchronized segregation of overlapped speech
    Dai, Li-Rong
    Song, Yan
    Wang, Ren-Hua
    [J]. Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2002, 30 (10): : 1552 - 1554
  • [4] SEQUENTIAL MULTI-FRAME NEURAL BEAMFORMING FOR SPEECH SEPARATION AND ENHANCEMENT
    Wang, Zhong-Qiu
    Erdogan, Hakan
    Wisdom, Scott
    Wilson, Kevin
    Raj, Desh
    Watanabe, Shinji
    Chen, Zhuo
    Hershey, John R.
    [J]. 2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 905 - 911
  • [5] Long-Frame-Shift Neural Speech Phase Prediction With Spectral Continuity Enhancement and Interpolation Error Compensation
    Ai, Yang
    Lu, Ye-Xin
    Ling, Zhen-Hua
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2023, 30 : 1097 - 1101
  • [6] Long-Frame-Shift Neural Speech Phase Prediction With Spectral Continuity Enhancement and Interpolation Error Compensation
    Ai, Yang
    Lu, Ye-Xin
    Ling, Zhen-Hua
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2023, 30 : 1097 - 1101
  • [7] ONLINE INTER-FRAME CORRELATION ESTIMATION METHODS FOR SPEECH ENHANCEMENT IN FREQUENCY SUBBANDS
    Schasse, Alexander
    Martin, Rainer
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7482 - 7486
  • [8] Markov chain prediction for missing speech frame compensation
    Kohler, MA
    Yarlagadda, RK
    [J]. 2000 IEEE WORKSHOP ON SPEECH CODING, PROCEEDINGS: MEETING THE CHALLENGES OF THE NEW MILLENNIUM, 2000, : 75 - 77
  • [9] Frame-based subband Kalman filtering for speech enhancement
    Wu, WR
    Chen, PC
    Chang, HT
    Kuo, CH
    [J]. ICSP '98: 1998 FOURTH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, PROCEEDINGS, VOLS I AND II, 1998, : 682 - 685
  • [10] Frame-level speech enhancement based on Wasserstein GAN
    Peng, Chuan
    Lan, Tian
    Li, Meng
    Li, Sen
    Liu, Qiao
    [J]. ELEVENTH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING SYSTEMS, 2019, 11384