Optimizing the Perceptual Quality of Time-Domain Speech Enhancement with Reinforcement Learning

被引:7
|
作者
Hao, Xiang [1 ]
Xu, Chenglin [2 ]
Xie, Lei [1 ]
Li, Haizhou [2 ]
机构
[1] Northwestern Polytech Univ, Sch Comp Sci, Xian 710000, Peoples R China
[2] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore 710129, Singapore
基金
新加坡国家研究基金会;
关键词
Measurement; Training; Convolution; Heuristic algorithms; Reinforcement learning; Speech enhancement; Filtering algorithms; speech enhancement; neural networks; dynamic filter; reinforcement learning; NOISE;
D O I
10.26599/TST.2021.9010048
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In neural speech enhancement, a mismatch exists between the training objective, i.e., Mean-Square Error (MSE), and perceptual quality evaluation metrics, i.e., perceptual evaluation of speech quality and short-time objective intelligibility. We propose a novel reinforcement learning algorithm and network architecture, which incorporate a non-differentiable perceptual quality evaluation metric into the objective function using a dynamic filter module. Unlike the traditional dynamic filter implementation that directly generates a convolution kernel, we use a filter generation agent to predict the probability density function of a multivariate Gaussian distribution, from which we sample the convolution kernel. Experimental results show that the proposed reinforcement learning method clearly improves the perceptual quality over other supervised learning methods with the MSE objective function.
引用
收藏
页码:939 / 947
页数:9
相关论文
共 50 条
  • [1] Optimizing the Perceptual Quality of Time-Domain Speech Enhancement with Reinforcement Learning
    Xiang Hao
    Chenglin Xu
    Lei Xie
    Haizhou Li
    [J]. Tsinghua Science and Technology, 2022, 27 (06) : 939 - 947
  • [2] A Time-domain Monaural Speech Enhancement with Feedback Learning
    Li, Andong
    Zheng, Chengshi
    Cheng, Linjuan
    Peng, Renhua
    Li, Xiaodong
    [J]. 2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 769 - 774
  • [3] Improved Speech Enhancement using a Time-Domain GAN with Mask Learning
    Lin, Ju
    Niu, Sufeng
    van Wijngaarden, Adriaan J.
    McClendon, Jerome L.
    Smith, Melissa C.
    Wang, Kuang-Ching
    [J]. INTERSPEECH 2020, 2020, : 3286 - 3290
  • [4] Visually Assisted Time-Domain Speech Enhancement
    Ideli, Elham
    Sharpe, Bruce
    Bajic, Ivan, V
    Vaughan, Rodney G.
    [J]. 2019 7TH IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (IEEE GLOBALSIP), 2019,
  • [5] Time-domain speech enhancement using generative adversarial networks
    Pascual, Santiago
    Serra, Joan
    Bonafonte, Antonio
    [J]. SPEECH COMMUNICATION, 2019, 114 : 10 - 21
  • [6] On Loss Functions for Supervised Monaural Time-Domain Speech Enhancement
    Kolbaek, Morten
    Tan, Zheng-Hua
    Jensen, Soren Holdt
    Jensen, Jesper
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 825 - 838
  • [7] SE-Conformer: Time-Domain Speech Enhancement using Conformer
    Kim, Eesung
    Seo, Hyeji
    [J]. INTERSPEECH 2021, 2021, : 2736 - 2740
  • [8] On the Use of Time-Domain Widely Linear Filtering for Binaural Speech Enhancement
    Szurley, Joseph
    Bertrand, Alexander
    Moonen, Marc
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2013, 20 (07) : 649 - 652
  • [9] Dense CNN With Self-Attention for Time-Domain Speech Enhancement
    Pandey, Ashutosh
    Wang, DeLiang
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 1270 - 1279
  • [10] Two-Stage Learning and Fusion Network With Noise Aware for Time-Domain Monaural Speech Enhancement
    Xiang, Xiaoxiao
    Zhang, Xiaojuan
    Chen, Haozhe
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 1754 - 1758