Optimizing the Perceptual Quality of Time-Domain Speech Enhancement with Reinforcement Learning

被引:0
|
作者
Xiang Hao [1 ]
Chenglin Xu [2 ]
Lei Xie [1 ]
Haizhou Li [2 ]
机构
[1] School of Computer Science,Northwestern Polytechnical University
[2] Department of Electrical and Computer Engineering,National University of Singapore
基金
新加坡国家研究基金会;
关键词
D O I
暂无
中图分类号
TN912.35 [语音增强]; TP181 [自动推理、机器学习];
学科分类号
0711 ; 081104 ; 0812 ; 0835 ; 1405 ;
摘要
In neural speech enhancement,a mismatch exists between the training objective,i.e.,Mean-Square Error(MSE),and perceptual quality evaluation metrics,i.e.,perceptual evaluation of speech quality and short-time objective intelligibility.We propose a novel reinforcement learning algorithm and network architecture,which incorporate a non-differentiable perceptual quality evaluation metric into the objective function using a dynamic filter module.Unlike the traditional dynamic filter implementation that directly generates a convolution kernel,we use a filter generation agent to predict the probability density function of a multivariate Gaussian distribution,from which we sample the convolution kernel.Experimental results show that the proposed reinforcement learning method clearly improves the perceptual quality over other supervised learning methods with the MSE objective function.
引用
收藏
页码:939 / 947
页数:9
相关论文
共 50 条
  • [41] Deep Reinforcement Learning Based Time-Domain Interference Alignment Scheduling for Underwater Acoustic Networks
    Zhao, Nan
    Yao, Nianmin
    Gao, Zhenguo
    Lu, Zhimao
    [J]. JOURNAL OF MARINE SCIENCE AND ENGINEERING, 2022, 10 (07)
  • [42] Real-Time Thai Speech Emotion Recognition With Speech Enhancement Using Time-Domain Contrastive Predictive Coding and Conv-Tasnet
    Yuenyong, Sumeth
    Hnoohom, Narit
    Wongpatikaseree, Konlakorn
    Singkul, Sattaya
    [J]. 2022 7TH INTERNATIONAL CONFERENCE ON BUSINESS AND INDUSTRIAL RESEARCH (ICBIR2022), 2022, : 78 - 83
  • [43] Time-domain Ad-hoc Array Speech Enhancement Using a Triple-path Network
    Pandey, Ashutosh
    Xu, Buye
    Kumar, Anurag
    Donley, Jacob
    Calamia, Paul
    Wang, DeLiang
    [J]. INTERSPEECH 2022, 2022, : 729 - 733
  • [44] IMPROVING NOISE ROBUST AUTOMATIC SPEECH RECOGNITION WITH SINGLE-CHANNEL TIME-DOMAIN ENHANCEMENT NETWORK
    Kinoshita, Keisuke
    Ochiai, Tsubasa
    Delcroix, Marc
    Nakatani, Tomohiro
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7009 - 7013
  • [45] Time-domain approach using multiple Kalman filters and EM algorithm to speech enhancement with nonstationary noise
    Lee, KY
    Jung, SW
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2000, 8 (03): : 282 - 291
  • [46] CLOSING THE GAP BETWEEN TIME-DOMAIN MULTI-CHANNEL SPEECH ENHANCEMENT ON REAL AND SIMULATION CONDITIONS
    Zhang, Wangyou
    Shi, Jing
    Li, Chenda
    Watanabe, Shinji
    Qian, Yanmin
    [J]. 2021 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2021, : 146 - 150
  • [47] Bayesian Separation With Sparsity Promotion in Perceptual Wavelet Domain for Speech Enhancement and Hybrid Speech Recognition
    Shao, Yu
    Chang, Chip-Hong
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART A-SYSTEMS AND HUMANS, 2011, 41 (02): : 284 - 293
  • [48] Numerical image enhancement for THz time-domain spectroscopy
    Schildknecht, C
    Kleine-Ostmann, T
    Knobloch, P
    Rehberg, E
    Koch, M
    [J]. THZ 2002: IEEE TENTH INTERNATIONAL CONFERENCE ON TERAHERTZ ELECTRONICS PROCEEDINGS, 2002, : 157 - 160
  • [49] A COMBINED ALGORITHM FOR OPTIMIZING MICROWAVE COMPONENTS MODELS IN TIME-DOMAIN
    SIFI, NE
    ANGENIEUX, G
    FERRARI, P
    [J]. IEEE TRANSACTIONS ON MAGNETICS, 1995, 31 (03) : 1980 - 1983
  • [50] A Comparative Study of Time and Frequency Domain Approaches to Deep Learning based Speech Enhancement
    Nossier, Soha A.
    Wall, Julie
    Moniri, Mansour
    Glackin, Cornelius
    Cannings, Nigel
    [J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,