Optimizing the Perceptual Quality of Time-Domain Speech Enhancement with Reinforcement Learning

被引:7
|
作者
Hao, Xiang [1 ]
Xu, Chenglin [2 ]
Xie, Lei [1 ]
Li, Haizhou [2 ]
机构
[1] Northwestern Polytech Univ, Sch Comp Sci, Xian 710000, Peoples R China
[2] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore 710129, Singapore
基金
新加坡国家研究基金会;
关键词
Measurement; Training; Convolution; Heuristic algorithms; Reinforcement learning; Speech enhancement; Filtering algorithms; speech enhancement; neural networks; dynamic filter; reinforcement learning; NOISE;
D O I
10.26599/TST.2021.9010048
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In neural speech enhancement, a mismatch exists between the training objective, i.e., Mean-Square Error (MSE), and perceptual quality evaluation metrics, i.e., perceptual evaluation of speech quality and short-time objective intelligibility. We propose a novel reinforcement learning algorithm and network architecture, which incorporate a non-differentiable perceptual quality evaluation metric into the objective function using a dynamic filter module. Unlike the traditional dynamic filter implementation that directly generates a convolution kernel, we use a filter generation agent to predict the probability density function of a multivariate Gaussian distribution, from which we sample the convolution kernel. Experimental results show that the proposed reinforcement learning method clearly improves the perceptual quality over other supervised learning methods with the MSE objective function.
引用
收藏
页码:939 / 947
页数:9
相关论文
共 50 条
  • [21] Improved Speech Enhancement using a Complex-Domain GAN with Fused Time-Domain and Time-frequency Domain Constraints
    Dang, Feng
    Zhang, Pengyuan
    Chen, Hangting
    [J]. INTERSPEECH 2021, 2021, : 2721 - 2725
  • [22] CPTNN: CROSS-PARALLEL TRANSFORMER NEURAL NETWORK FOR TIME-DOMAIN SPEECH ENHANCEMENT
    Wang, Kai
    He, Bengbeng
    Zhu, Wei-Ping
    [J]. 2022 INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC 2022), 2022,
  • [23] Two-Branch Network with Selective Kernel Convolution for Time-Domain Speech Enhancement
    Li, Hui
    Huang, Zhihua
    Guo, Chuangjian
    [J]. 2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 478 - 482
  • [24] Multi-channel Speech Enhancement Using Time-Domain Convolutional Denoising Autoencoder
    Tawara, Naohiro
    Kobayashi, Tetsunori
    Ogawa, Tetsuji
    [J]. INTERSPEECH 2019, 2019, : 86 - 90
  • [25] Time-Domain Joint Training Strategies of Speech Enhancement and Intent Classification Neural Models
    Ali, Mohamed Nabih
    Falavigna, Daniele
    Brutti, Alessio
    [J]. SENSORS, 2022, 22 (01)
  • [26] Group Multi-Scale convolutional Network for Monaural Speech Enhancement in Time-domain
    Yu, Juntao
    Jiang, Ting
    Yu, Jiacheng
    [J]. 2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 646 - 650
  • [27] Single-channel deep time-domain speech enhancement networks for cabin environments
    Zhang, Lin
    Wang, Haitao
    Yang, Shuang
    Zeng, Xiangyang
    Chen, Ke'an
    [J]. Shengxue Xuebao/Acta Acustica, 2023, 48 (04): : 890 - 900
  • [28] TIME-DOMAIN AUDIO-VISUAL SPEECH SEPARATION ON LOW QUALITY VIDEOS
    Wu, Yifei
    Li, Chenda
    Bai, Jinfeng
    Wu, Zhongqin
    Qian, Yanmin
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 256 - 260
  • [29] REINFORCEMENT LEARNING BASED SPEECH ENHANCEMENT FOR ROBUST SPEECH RECOGNITION
    Shen, Yih-Liang
    Huang, Chao-Yuan
    Wang, Syu-Siang
    Tsao, Yu
    Wang, Hsin-Min
    Chi, Tai-Shih
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6750 - 6754
  • [30] SEGMENTATION OF SPEECH UTILIZING TIME-DOMAIN PROPERTIES OF SPEECH SIGNALS
    AKAMATSU, N
    NIKI, N
    TAKAHASHI, Y
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1978, 64 : S179 - S179