Estimating nonnegative matrix model activations with deep neural networks to increase perceptual speech quality

被引:18
|
作者
Williamson, Donald S. [1 ]
Wang, Yuxuan [1 ]
Wang, DeLiang [1 ,2 ]
机构
[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
[2] Ohio State Univ, Ctr Cognit & Brain Sci, Columbus, OH 43210 USA
来源
关键词
NORMAL-HEARING; NOISE; FACTORIZATION; INTELLIGIBILITY; SEPARATION; ALGORITHM;
D O I
10.1121/1.4928612
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
As a means of speech separation, time-frequency masking applies a gain function to the time-frequency representation of noisy speech. On the other hand, nonnegative matrix factorization (NMF) addresses separation by linearly combining basis vectors from speech and noise models to approximate noisy speech. This paper presents an approach for improving the perceptual quality of speech separated from background noise at low signal-to-noise ratios. An ideal ratio mask is estimated, which separates speech from noise with reasonable sound quality. A deep neural network then approximates clean speech by estimating activation weights from the ratio-masked speech, where the weights linearly combine elements from a NMF speech model. Systematic comparisons using objective metrics, including the perceptual evaluation of speech quality, show that the proposed algorithm achieves higher speech quality than related masking and NMF methods. In addition, a listening test was performed and its results show that the output of the proposed algorithm is preferred over the comparison systems in terms of speech quality. (C) 2015 Acoustical Society of America.
引用
收藏
页码:1399 / 1407
页数:9
相关论文
共 50 条
  • [41] Speech emotion recognition with deep convolutional neural networks
    Issa, Dias
    Demirci, M. Fatih
    Yazici, Adnan
    [J]. BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2020, 59
  • [42] Noisy training for deep neural networks in speech recognition
    Yin, Shi
    Liu, Chao
    Zhang, Zhiyong
    Lin, Yiye
    Wang, Dong
    Tejedor, Javier
    Zheng, Thomas Fang
    Li, Yinguo
    [J]. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2015, : 1 - 14
  • [43] Speech Activity Detection Using Deep Neural Networks
    Shahsavari, Sajad
    Sameti, Hossein
    Hadian, Hossein
    [J]. 2017 25TH IRANIAN CONFERENCE ON ELECTRICAL ENGINEERING (ICEE), 2017, : 1564 - 1568
  • [44] Noisy training for deep neural networks in speech recognition
    Shi Yin
    Chao Liu
    Zhiyong Zhang
    Yiye Lin
    Dong Wang
    Javier Tejedor
    Thomas Fang Zheng
    Yinguo Li
    [J]. EURASIP Journal on Audio, Speech, and Music Processing, 2015
  • [45] Deep Neural Networks for Acoustic Modeling in Speech Recognition
    Hinton, Geoffrey
    Deng, Li
    Yu, Dong
    Dahl, George E.
    Mohamed, Abdel-rahman
    Jaitly, Navdeep
    Senior, Andrew
    Vanhoucke, Vincent
    Patrick Nguyen
    Sainath, Tara N.
    Kingsbury, Brian
    [J]. IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) : 82 - 97
  • [46] A NETWORK OF DEEP NEURAL NETWORKS FOR DISTANT SPEECH RECOGNITION
    Ravanelli, Mirco
    Brakel, Philemon
    Omologo, Maurizio
    Bengio, Yoshua
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 4880 - 4884
  • [47] Survey on Deep Neural Networks in Speech and Vision Systems
    Alam, M.
    Samad, M. D.
    Vidyaratne, L.
    Glandon, A.
    Iftekharuddin, K. M.
    [J]. NEUROCOMPUTING, 2020, 417 : 302 - 321
  • [48] Mongolian Speech Recognition Based on Deep Neural Networks
    Zhang, Hui
    Bao, Feilong
    Gao, Guanglai
    [J]. CHINESE COMPUTATIONAL LINGUISTICS AND NATURAL LANGUAGE PROCESSING BASED ON NATURALLY ANNOTATED BIG DATA (CCL 2015), 2015, 9427 : 180 - 188
  • [49] Speech bandwidth expansion based on Deep Neural Networks
    Wang, Yingxue
    Zhao, Shenghui
    Liu, Wenbo
    Li, Ming
    Kuang, Jingming
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2593 - 2597
  • [50] Emotional Speech Recognition Using Deep Neural Networks
    Trinh Van, Loan
    Dao Thi Le, Thuy
    Le Xuan, Thanh
    Castelli, Eric
    [J]. SENSORS, 2022, 22 (04)