A new perceptually weighted cost function in deep neural network based speech enhancement systems

被引:1
|
作者
Goli, Peyman [1 ]
机构
[1] Khavaran Inst Higher Educ, Mashhad, Razavi Khorasan, Iran
关键词
Deep neural networks; speech enhancement; psychoacoustic models; speech intelligibility; NOISE;
D O I
10.1080/21695717.2019.1603948
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Speech intelligibility improvement is an important task to increase human perception in telecommunication systems and hearing aids when the speech is degraded by the background noises. Although, deep neural network (DNN) based learning architectures which use mean square error (MSE) as the cost function has been found to be very successful in speech enhancement areas, they typically attempt to enhance the speech quality by uniformly optimizing the separation of a target speech signal from a noisy observation over all frequency bands. In this work, we propose a new cost function which further focuses on speech intelligibility improvement based on a psychoacoustic model. The band-importance function, which is a principal component of speech intelligibility index (SII), has been used to determine the relative contribution to speech intelligibility provided by each frequency band in learning algorithm. In addition, we augment a signal to noise ratio (SNR) estimation to the network to improve the generalization of the method to unseen noisy conditions. The performance of the proposed MSE cost function is compared with the conventional MSE cost function in the same conditions. Our approach shows better performance in objective speech intelligibility measures such as coherence SII (CSII) and short-time objective intelligibility (STOI), while mitigating quality scores in perceptual evaluation of speech quality (PESQ) and speech distortion (SD) measure.
引用
收藏
页码:191 / 196
页数:6
相关论文
共 50 条
  • [1] A Perceptually Motivated Approach for Speech Enhancement Based on Deep Neural Network
    Han, Wei
    Zhang, Xiongwei
    Min, Gang
    Sun, Meng
    [J]. IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2016, E99A (04): : 835 - 838
  • [2] A Perceptually-Weighted Deep Neural Network for Monaural Speech Enhancement in Various Background Noise Conditions
    Liu, Qingju
    Wang, Wenwu
    Jackson, Philip J. B.
    Tang, Yan
    [J]. 2017 25TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2017, : 1270 - 1274
  • [3] PERCEPTUALLY GUIDED SPEECH ENHANCEMENT USING DEEP NEURAL NETWORKS
    Zhao, Yan
    Xu, Buye
    Giri, Ritwik
    Zhang, Tao
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5074 - 5078
  • [4] Perceptually based speech enhancement using the weighted β-SA estimator
    Plourde, Eric
    Champagne, Benoit
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4193 - 4196
  • [5] Speech Enhancement based on Deep Convolutional Neural Network
    Nuthakki, Ramesh
    Masanta, Payel
    Yukta, T. N.
    [J]. PROCEEDINGS OF THE 2021 FIFTH INTERNATIONAL CONFERENCE ON I-SMAC (IOT IN SOCIAL, MOBILE, ANALYTICS AND CLOUD) (I-SMAC 2021), 2021, : 770 - 775
  • [6] Supervised speech enhancement based on deep neural network
    Saleem, Nasir
    Khattak, Muhammad Irfan
    Qazi, Abdul Baser
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2019, 37 (04) : 5187 - 5201
  • [7] Speech Intelligibility Potential of General and Specialized Deep Neural Network Based Speech Enhancement Systems
    Kolbaek, Morten
    Tan, Zheng-Hua
    Jensen, Jesper
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (01) : 153 - 167
  • [8] An optimization method for speech enhancement based on deep neural network
    Sun, Haixia
    Li, Sikun
    [J]. 3RD INTERNATIONAL CONFERENCE ON ADVANCES IN ENERGY, ENVIRONMENT AND CHEMICAL ENGINEERING, 2017, 69
  • [9] Speech enhancement based on noise classification and deep neural network
    Wang, Wenbo
    Liu, Houguang
    Yang, Jianhua
    Cao, Guohua
    Hua, Chunli
    [J]. MODERN PHYSICS LETTERS B, 2019, 33 (17):
  • [10] EXEMPLAR-BASED SPEECH ENHANCEMENT FOR DEEP NEURAL NETWORK BASED AUTOMATIC SPEECH RECOGNITION
    Baby, Deepak
    Gemmeke, Jort F.
    Virtanen, Tuomas
    Van hamme, Hugo
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4485 - 4489