Estimating nonnegative matrix model activations with deep neural networks to increase perceptual speech quality

被引:18
|
作者
Williamson, Donald S. [1 ]
Wang, Yuxuan [1 ]
Wang, DeLiang [1 ,2 ]
机构
[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
[2] Ohio State Univ, Ctr Cognit & Brain Sci, Columbus, OH 43210 USA
来源
关键词
NORMAL-HEARING; NOISE; FACTORIZATION; INTELLIGIBILITY; SEPARATION; ALGORITHM;
D O I
10.1121/1.4928612
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
As a means of speech separation, time-frequency masking applies a gain function to the time-frequency representation of noisy speech. On the other hand, nonnegative matrix factorization (NMF) addresses separation by linearly combining basis vectors from speech and noise models to approximate noisy speech. This paper presents an approach for improving the perceptual quality of speech separated from background noise at low signal-to-noise ratios. An ideal ratio mask is estimated, which separates speech from noise with reasonable sound quality. A deep neural network then approximates clean speech by estimating activation weights from the ratio-masked speech, where the weights linearly combine elements from a NMF speech model. Systematic comparisons using objective metrics, including the perceptual evaluation of speech quality, show that the proposed algorithm achieves higher speech quality than related masking and NMF methods. In addition, a listening test was performed and its results show that the output of the proposed algorithm is preferred over the comparison systems in terms of speech quality. (C) 2015 Acoustical Society of America.
引用
收藏
页码:1399 / 1407
页数:9
相关论文
共 50 条
  • [1] DEEP NEURAL NETWORKS FOR ESTIMATING SPEECH MODEL ACTIVATIONS
    Williamson, Donald S.
    Wang, Yuxuan
    Wang, DeLiang
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 5113 - 5117
  • [2] PERCEPTUAL IMPROVEMENT OF DEEP NEURAL NETWORKS FOR MONAURAL SPEECH ENHANCEMENT
    Han, Wei
    Zhang, Xiongwei
    Sun, Meng
    Shi, Wenhua
    Chen, Xushan
    Hu, Yonggang
    [J]. 2016 IEEE INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2016,
  • [3] Alternating Optimization Method Based on Nonnegative Matrix Factorizations for Deep Neural Networks
    Sakurai, Tetsuya
    Imakura, Akira
    Inoue, Yuto
    Futamura, Yasunori
    [J]. NEURAL INFORMATION PROCESSING, ICONIP 2016, PT IV, 2016, 9950 : 354 - 362
  • [4] Deep Transductive Nonnegative Matrix Factorization for Speech Separation
    Liu, Yalin
    Guan, Naiyang
    Liu, Jie
    [J]. 2017 16TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2017, : 249 - 254
  • [5] Perceptual Weighting Deep Neural Networks for Single-channel Speech Enhancement
    Han, Wei
    Zhang, Xiongwei
    Min, Gang
    Zhou, Xingyu
    Zhang, Wei
    [J]. PROCEEDINGS OF THE 2016 12TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION (WCICA), 2016, : 446 - 450
  • [6] Explaining the Behavior of Neuron Activations in Deep Neural Networks
    Wang, Longwei
    Wang, Chengfei
    Li, Yupeng
    Wang, Rui
    [J]. AD HOC NETWORKS, 2021, 111
  • [7] Perceptual Hashing of Deep Convolutional Neural Networks for Model Copy Detection
    Chen, Haozhe
    Zhou, Hang
    Zhang, Jie
    Chen, Dongdong
    Zhang, Weiming
    Chen, Kejiang
    Hua, Gang
    Yu, Nenghai
    [J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (03)
  • [8] Nonnegative periodic dynamics of delayed Cohen-Grossberg neural networks with discontinuous activations
    He, Xiangnan
    Lu, Wenlian
    Chen, Tianping
    [J]. NEUROCOMPUTING, 2010, 73 (13-15) : 2765 - 2772
  • [9] Training Deep Photonic Convolutional Neural Networks With Sinusoidal Activations
    Passalis, Nikolaos
    Mourgias-Alexandris, George
    Tsakyridis, Apostolos
    Pleros, Nikos
    Tefas, Anastasios
    [J]. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2021, 5 (03): : 384 - 393
  • [10] Scaling Deep Spiking Neural Networks with Binary Stochastic Activations
    Roy, Deboleena
    Chakraborty, Indranil
    Roy, Kaushik
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON COGNITIVE COMPUTING (IEEE ICCC 2019), 2019, : 50 - 58