WEIGHTED SPEECH DISTORTION LOSSES FOR NEURAL-NETWORK-BASED REAL-TIME SPEECH ENHANCEMENT

被引:0
|
作者
Xia, Yangyang [1 ]
Braun, Sebastian [2 ]
Reddy, Chandan K. A. [2 ]
Dubey, Harishchandra [2 ]
Cutler, Ross [2 ]
Tashev, Ivan [2 ]
机构
[1] Carnegie Mellon Univ, Dept Elect & Comp Engn, Pittsburgh, PA 15213 USA
[2] Microsoft Corp, Redmond, WA 98052 USA
关键词
Real-time speech enhancement; recurrent neural networks; loss function; speech distortion; mean opinion score; PERCEPTUAL EVALUATION; NOISE;
D O I
10.1109/icassp40776.2020.9054254
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper investigates several aspects of training a RNN (recurrent neural network) that impact the objective and subjective quality of enhanced speech for real-time single-channel speech enhancement. Specifically, we focus on a RNN that enhances short-time speech spectra on a single-frame-in, single-frame-out basis, a framework adopted by most classical signal processing methods. We propose two novel mean-squared-error-based learning objectives that enable separate control over the importance of speech distortion versus noise reduction. The proposed loss functions are evaluated by widely accepted objective quality and intelligibility measures and compared to other competitive online methods. In addition, we study the impact of feature normalization and varying batch sequence lengths on the objective quality of enhanced speech. Finally, we show subjective ratings for the proposed approach and a state-of-the-art real-time RNN-based method.
引用
收藏
页码:871 / 875
页数:5
相关论文
共 50 条
  • [1] A MODULATION-DOMAIN LOSS FOR NEURAL-NETWORK-BASED REAL-TIME SPEECH ENHANCEMENT
    Vuong, Tyler
    Xia, Yangyang
    Stern, Richard M.
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6643 - 6647
  • [2] Real-Time Speech Enhancement Based on Convolutional Recurrent Neural Network
    Girirajan, S.
    Pandian, A.
    [J]. INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2023, 35 (02): : 1987 - 2001
  • [3] INCORPORATING REAL-WORLD NOISY SPEECH IN NEURAL-NETWORK-BASED SPEECH ENHANCEMENT SYSTEMS
    Xia, Yangyang
    Xu, Buye
    Kumar, Anurag
    [J]. 2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 564 - 570
  • [4] A Convolutional Recurrent Neural Network for Real-Time Speech Enhancement
    Tan, Ke
    Wang, DeLiang
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3229 - 3233
  • [5] TCNN: TEMPORAL CONVOLUTIONAL NEURAL NETWORK FOR REAL-TIME SPEECH ENHANCEMENT IN THE TIME DOMAIN
    Pandey, Ashutosh
    Wang, DeLiang
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6875 - 6879
  • [6] DENSELY CONNECTED NEURAL NETWORK WITH DILATED CONVOLUTIONS FOR REAL-TIME SPEECH ENHANCEMENT IN THE TIME DOMAIN
    Pandey, Ashutosh
    Wang, DeLiang
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6629 - 6633
  • [7] A Real-Time Convolutional Neural Network Based Speech Enhancement for Hearing Impaired Listeners Using Smartphone
    Bhat, Gautam S.
    Shankar, Nikhil
    Reddy, Chandan K. A.
    Panahi, Issa M. S.
    [J]. IEEE ACCESS, 2019, 7 : 78421 - 78433
  • [8] Real-time Multi-channel Speech Enhancement Based on Neural Network Masking with Attention Model
    Xue, Cheng
    Huang, Weilong
    Chen, Weiguang
    Feng, Jinwei
    [J]. INTERSPEECH 2021, 2021, : 1862 - 1866
  • [9] FSCNet: Feature-Specific Convolution Neural Network for Real-Time Speech Enhancement
    Cheng, Longbiao
    Li, Junfeng
    Yan, Yonghong
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 1958 - 1962
  • [10] Improved Modulation-Domain Loss for Neural-Network-based Speech Enhancement
    Vuong, Tyler
    Stern, Richard M.
    [J]. INTERSPEECH 2022, 2022, : 206 - 210