Optimizing the Perceptual Quality of Time-Domain Speech Enhancement with Reinforcement Learning

被引：7

作者：

Hao, Xiang ^{[1
]}

Xu, Chenglin ^{[2
]}

Xie, Lei ^{[1
]}

Li, Haizhou ^{[2
]}

机构：

[1] Northwestern Polytech Univ, Sch Comp Sci, Xian 710000, Peoples R China

[2] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore 710129, Singapore

来源：

TSINGHUA SCIENCE AND TECHNOLOGY | 2022年 / 27卷 / 06期

基金：

新加坡国家研究基金会;

关键词：

Measurement; Training; Convolution; Heuristic algorithms; Reinforcement learning; Speech enhancement; Filtering algorithms; speech enhancement; neural networks; dynamic filter; reinforcement learning; NOISE;

D O I：

10.26599/TST.2021.9010048

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In neural speech enhancement, a mismatch exists between the training objective, i.e., Mean-Square Error (MSE), and perceptual quality evaluation metrics, i.e., perceptual evaluation of speech quality and short-time objective intelligibility. We propose a novel reinforcement learning algorithm and network architecture, which incorporate a non-differentiable perceptual quality evaluation metric into the objective function using a dynamic filter module. Unlike the traditional dynamic filter implementation that directly generates a convolution kernel, we use a filter generation agent to predict the probability density function of a multivariate Gaussian distribution, from which we sample the convolution kernel. Experimental results show that the proposed reinforcement learning method clearly improves the perceptual quality over other supervised learning methods with the MSE objective function.

引用

页码：939 / 947

页数：9

共 50 条

[21] Improved Speech Enhancement using a Complex-Domain GAN with Fused Time-Domain and Time-frequency Domain Constraints
Dang, Feng
Zhang, Pengyuan
Chen, Hangting
[J]. INTERSPEECH 2021, 2021, : 2721 - 2725
[22] CPTNN: CROSS-PARALLEL TRANSFORMER NEURAL NETWORK FOR TIME-DOMAIN SPEECH ENHANCEMENT
Wang, Kai
He, Bengbeng
Zhu, Wei-Ping
[J]. 2022 INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC 2022), 2022,
[23] Two-Branch Network with Selective Kernel Convolution for Time-Domain Speech Enhancement
Li, Hui
Huang, Zhihua
Guo, Chuangjian
[J]. 2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 478 - 482
[24] Multi-channel Speech Enhancement Using Time-Domain Convolutional Denoising Autoencoder
Tawara, Naohiro
Kobayashi, Tetsunori
Ogawa, Tetsuji
[J]. INTERSPEECH 2019, 2019, : 86 - 90
[25] Time-Domain Joint Training Strategies of Speech Enhancement and Intent Classification Neural Models
Ali, Mohamed Nabih
Falavigna, Daniele
Brutti, Alessio
[J]. SENSORS, 2022, 22 (01)
[26] Group Multi-Scale convolutional Network for Monaural Speech Enhancement in Time-domain
Yu, Juntao
Jiang, Ting
Yu, Jiacheng
[J]. 2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 646 - 650
[27] Single-channel deep time-domain speech enhancement networks for cabin environments
Zhang, Lin
Wang, Haitao
Yang, Shuang
Zeng, Xiangyang
Chen, Ke'an
[J]. Shengxue Xuebao/Acta Acustica, 2023, 48 (04): : 890 - 900
[28] TIME-DOMAIN AUDIO-VISUAL SPEECH SEPARATION ON LOW QUALITY VIDEOS
Wu, Yifei
Li, Chenda
Bai, Jinfeng
Wu, Zhongqin
Qian, Yanmin
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 256 - 260
[29] REINFORCEMENT LEARNING BASED SPEECH ENHANCEMENT FOR ROBUST SPEECH RECOGNITION
Shen, Yih-Liang
Huang, Chao-Yuan
Wang, Syu-Siang
Tsao, Yu
Wang, Hsin-Min
Chi, Tai-Shih
[J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6750 - 6754
[30] SEGMENTATION OF SPEECH UTILIZING TIME-DOMAIN PROPERTIES OF SPEECH SIGNALS
AKAMATSU, N
NIKI, N
TAKAHASHI, Y
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1978, 64 : S179 - S179

← 1 2 3 4 5 →