A MODULATION-DOMAIN LOSS FOR NEURAL-NETWORK-BASED REAL-TIME SPEECH ENHANCEMENT

被引:6
|
作者
Vuong, Tyler [1 ]
Xia, Yangyang [1 ]
Stern, Richard M. [1 ,2 ]
机构
[1] Carnegie Mellon Univ, Dept Elect & Comp Engn, Pittsburgh, PA 15213 USA
[2] Carnegie Mellon Univ, Language Technol Inst, Pittsburgh, PA 15213 USA
关键词
Real-time speech enhancement; spectro-temporal receptive field; loss functions; PERCEPTUAL EVALUATION; INDEX;
D O I
10.1109/ICASSP39728.2021.9414965
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We describe a modulation-domain loss function for deep-learning-based speech enhancement systems. Learnable spectro-temporal receptive fields (STRFs) were adapted to optimize for a speaker identification task. The learned STRFs were then used to calculate a weighted mean-squared error (MSE) in the modulation domain for training a speech enhancement system. Experiments showed that adding the modulation-domain MSE to the MSE in the spectro-temporal domain substantially improved the objective prediction of speech quality and intelligibility for real-time speech enhancement systems without incurring additional computation during inference.
引用
收藏
页码:6643 / 6647
页数:5
相关论文
共 50 条
  • [1] Improved Modulation-Domain Loss for Neural-Network-based Speech Enhancement
    Vuong, Tyler
    Stern, Richard M.
    [J]. INTERSPEECH 2022, 2022, : 206 - 210
  • [2] WEIGHTED SPEECH DISTORTION LOSSES FOR NEURAL-NETWORK-BASED REAL-TIME SPEECH ENHANCEMENT
    Xia, Yangyang
    Braun, Sebastian
    Reddy, Chandan K. A.
    Dubey, Harishchandra
    Cutler, Ross
    Tashev, Ivan
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 871 - 875
  • [3] TCNN: TEMPORAL CONVOLUTIONAL NEURAL NETWORK FOR REAL-TIME SPEECH ENHANCEMENT IN THE TIME DOMAIN
    Pandey, Ashutosh
    Wang, DeLiang
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6875 - 6879
  • [4] Real-Time Speech Enhancement Based on Convolutional Recurrent Neural Network
    Girirajan, S.
    Pandian, A.
    [J]. INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2023, 35 (02): : 1987 - 2001
  • [5] DENSELY CONNECTED NEURAL NETWORK WITH DILATED CONVOLUTIONS FOR REAL-TIME SPEECH ENHANCEMENT IN THE TIME DOMAIN
    Pandey, Ashutosh
    Wang, DeLiang
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6629 - 6633
  • [6] Modulation-Domain Multichannel Kalman Filtering for Speech Enhancement
    Xue, Wei
    Moore, Alastair H.
    Brookes, Mike
    Naylor, Patrick A.
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (10) : 1833 - 1847
  • [7] Speech Enhancement Based on Modulation-Domain Parametric Multichannel Kalman Filtering
    Xue, Wei
    Moore, Alastair H.
    Brookes, Mike
    Naylor, Patrick A.
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 393 - 405
  • [8] Neural-network-based observer for real-time tipover estimation
    Meghdari, A
    Naderi, D
    Alam, MR
    [J]. MECHATRONICS, 2005, 15 (08) : 989 - 1004
  • [9] Modulation-Domain Parametric Multichannel Kalman Filtering for Speech Enhancement
    Xue, Wei
    Moore, Alastair H.
    Brookes, Mike
    Naylor, Patrick A.
    [J]. 2018 26TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2018, : 2509 - 2513
  • [10] A Convolutional Recurrent Neural Network for Real-Time Speech Enhancement
    Tan, Ke
    Wang, DeLiang
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3229 - 3233