Improved Modulation-Domain Loss for Neural-Network-based Speech Enhancement

被引:1
|
作者
Vuong, Tyler [1 ]
Stern, Richard M. [1 ,2 ]
机构
[1] Carnegie Mellon Univ, Dept Elect & Comp Engn, Pittsburgh, PA 15213 USA
[2] Carnegie Mellon Univ, Language Technol Inst, Pittsburgh, PA 15213 USA
来源
关键词
speech enhancement; spectro-temporal receptive; fields;
D O I
10.21437/Interspeech.2022-11082
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We describe an improved modulation-domain loss for deep learning-based speech enhancement systems (SE). We utilized a simple self-supervised speech reconstruction task to learn a set of spectro-temporal receptive fields (STRFs). Similar to the recently developed spectro-temporal modulation error, the learned STRFs are used to calculate a weighted mean-squared error in the modulation domain for training a speech enhancement system. Experiments show that training the SE systems using the improved modulation-domain loss consistently improves the objective prediction of speech quality and intelligibility. Additionally, we show that the SE systems improve the word error rate of a state-of-the-art automatic speech recognition system at low SNRs.
引用
收藏
页码:206 / 210
页数:5
相关论文
共 50 条
  • [1] A MODULATION-DOMAIN LOSS FOR NEURAL-NETWORK-BASED REAL-TIME SPEECH ENHANCEMENT
    Vuong, Tyler
    Xia, Yangyang
    Stern, Richard M.
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6643 - 6647
  • [2] Modulation-Domain Multichannel Kalman Filtering for Speech Enhancement
    Xue, Wei
    Moore, Alastair H.
    Brookes, Mike
    Naylor, Patrick A.
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (10) : 1833 - 1847
  • [3] Speech Enhancement Based on Modulation-Domain Parametric Multichannel Kalman Filtering
    Xue, Wei
    Moore, Alastair H.
    Brookes, Mike
    Naylor, Patrick A.
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 393 - 405
  • [4] Modulation-Domain Parametric Multichannel Kalman Filtering for Speech Enhancement
    Xue, Wei
    Moore, Alastair H.
    Brookes, Mike
    Naylor, Patrick A.
    [J]. 2018 26TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2018, : 2509 - 2513
  • [5] Modulation-domain Kalman filtering for single-channel speech enhancement
    So, Stephen
    Paliwal, Kuldip K.
    [J]. SPEECH COMMUNICATION, 2011, 53 (06) : 818 - 829
  • [6] New Results in Modulation-Domain Single-Channel Speech Enhancement
    Mowlaee, Pejman
    Blass, Martin
    Kleijn, W. Bastiaan
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (11) : 2125 - 2137
  • [7] WEIGHTED SPEECH DISTORTION LOSSES FOR NEURAL-NETWORK-BASED REAL-TIME SPEECH ENHANCEMENT
    Xia, Yangyang
    Braun, Sebastian
    Reddy, Chandan K. A.
    Dubey, Harishchandra
    Cutler, Ross
    Tashev, Ivan
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 871 - 875
  • [8] INCORPORATING REAL-WORLD NOISY SPEECH IN NEURAL-NETWORK-BASED SPEECH ENHANCEMENT SYSTEMS
    Xia, Yangyang
    Xu, Buye
    Kumar, Anurag
    [J]. 2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 564 - 570
  • [9] MODULATION-DOMAIN SPEECH ENHANCEMENT USING A KALMAN FILTER WITH A BAYESIAN UPDATE OF SPEECH AND NOISE IN THE LOG-SPECTRAL DOMAIN
    Dionelis, Nikolaos
    Brookes, Mike
    [J]. 2017 HANDS-FREE SPEECH COMMUNICATIONS AND MICROPHONE ARRAYS (HSCMA 2017), 2017, : 111 - 115
  • [10] Phase-Aware Single-Channel Speech Enhancement With Modulation-Domain Kalman Filtering
    Dionelis, Nikolaos
    Brookes, Mike
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (05) : 937 - 950