Optimizing the Perceptual Quality of Time-Domain Speech Enhancement with Reinforcement Learning

被引：7

作者：

Hao, Xiang ^{[1
]}

Xu, Chenglin ^{[2
]}

Xie, Lei ^{[1
]}

Li, Haizhou ^{[2
]}

机构：

[1] Northwestern Polytech Univ, Sch Comp Sci, Xian 710000, Peoples R China

[2] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore 710129, Singapore

来源：

TSINGHUA SCIENCE AND TECHNOLOGY | 2022年 / 27卷 / 06期

基金：

新加坡国家研究基金会;

关键词：

Measurement; Training; Convolution; Heuristic algorithms; Reinforcement learning; Speech enhancement; Filtering algorithms; speech enhancement; neural networks; dynamic filter; reinforcement learning; NOISE;

D O I：

10.26599/TST.2021.9010048

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In neural speech enhancement, a mismatch exists between the training objective, i.e., Mean-Square Error (MSE), and perceptual quality evaluation metrics, i.e., perceptual evaluation of speech quality and short-time objective intelligibility. We propose a novel reinforcement learning algorithm and network architecture, which incorporate a non-differentiable perceptual quality evaluation metric into the objective function using a dynamic filter module. Unlike the traditional dynamic filter implementation that directly generates a convolution kernel, we use a filter generation agent to predict the probability density function of a multivariate Gaussian distribution, from which we sample the convolution kernel. Experimental results show that the proposed reinforcement learning method clearly improves the perceptual quality over other supervised learning methods with the MSE objective function.

引用

页码：939 / 947

页数：9

共 50 条

[1] Optimizing the Perceptual Quality of Time-Domain Speech Enhancement with Reinforcement Learning
Xiang Hao
Chenglin Xu
Lei Xie
Haizhou Li
[J]. Tsinghua Science and Technology, 2022, 27 (06) : 939 - 947
[2] A Time-domain Monaural Speech Enhancement with Feedback Learning
Li, Andong
Zheng, Chengshi
Cheng, Linjuan
Peng, Renhua
Li, Xiaodong
[J]. 2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 769 - 774
[3] Improved Speech Enhancement using a Time-Domain GAN with Mask Learning
Lin, Ju
Niu, Sufeng
van Wijngaarden, Adriaan J.
McClendon, Jerome L.
Smith, Melissa C.
Wang, Kuang-Ching
[J]. INTERSPEECH 2020, 2020, : 3286 - 3290
[4] Visually Assisted Time-Domain Speech Enhancement
Ideli, Elham
Sharpe, Bruce
Bajic, Ivan, V
Vaughan, Rodney G.
[J]. 2019 7TH IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (IEEE GLOBALSIP), 2019,
[5] Time-domain speech enhancement using generative adversarial networks
Pascual, Santiago
Serra, Joan
Bonafonte, Antonio
[J]. SPEECH COMMUNICATION, 2019, 114 : 10 - 21
[6] On Loss Functions for Supervised Monaural Time-Domain Speech Enhancement
Kolbaek, Morten
Tan, Zheng-Hua
Jensen, Soren Holdt
Jensen, Jesper
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 825 - 838
[7] SE-Conformer: Time-Domain Speech Enhancement using Conformer
Kim, Eesung
Seo, Hyeji
[J]. INTERSPEECH 2021, 2021, : 2736 - 2740
[8] On the Use of Time-Domain Widely Linear Filtering for Binaural Speech Enhancement
Szurley, Joseph
Bertrand, Alexander
Moonen, Marc
[J]. IEEE SIGNAL PROCESSING LETTERS, 2013, 20 (07) : 649 - 652
[9] Dense CNN With Self-Attention for Time-Domain Speech Enhancement
Pandey, Ashutosh
Wang, DeLiang
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 1270 - 1279
[10] Two-Stage Learning and Fusion Network With Noise Aware for Time-Domain Monaural Speech Enhancement
Xiang, Xiaoxiao
Zhang, Xiaojuan
Chen, Haozhe
[J]. IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 1754 - 1758

← 1 2 3 4 5 →