Estimating nonnegative matrix model activations with deep neural networks to increase perceptual speech quality

被引：18

作者：

Williamson, Donald S. ^{[1
]}

Wang, Yuxuan ^{[1
]}

Wang, DeLiang ^{[1
,2
]}

机构：

[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA

[2] Ohio State Univ, Ctr Cognit & Brain Sci, Columbus, OH 43210 USA

来源：

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA | 2015年 / 138卷 / 03期

关键词：

NORMAL-HEARING; NOISE; FACTORIZATION; INTELLIGIBILITY; SEPARATION; ALGORITHM;

D O I：

10.1121/1.4928612

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

As a means of speech separation, time-frequency masking applies a gain function to the time-frequency representation of noisy speech. On the other hand, nonnegative matrix factorization (NMF) addresses separation by linearly combining basis vectors from speech and noise models to approximate noisy speech. This paper presents an approach for improving the perceptual quality of speech separated from background noise at low signal-to-noise ratios. An ideal ratio mask is estimated, which separates speech from noise with reasonable sound quality. A deep neural network then approximates clean speech by estimating activation weights from the ratio-masked speech, where the weights linearly combine elements from a NMF speech model. Systematic comparisons using objective metrics, including the perceptual evaluation of speech quality, show that the proposed algorithm achieves higher speech quality than related masking and NMF methods. In addition, a listening test was performed and its results show that the output of the proposed algorithm is preferred over the comparison systems in terms of speech quality. (C) 2015 Acoustical Society of America.

引用

页码：1399 / 1407

页数：9

共 50 条

[41] Speech emotion recognition with deep convolutional neural networks
Issa, Dias
Demirci, M. Fatih
Yazici, Adnan
[J]. BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2020, 59
[42] Noisy training for deep neural networks in speech recognition
Yin, Shi
Liu, Chao
Zhang, Zhiyong
Lin, Yiye
Wang, Dong
Tejedor, Javier
Zheng, Thomas Fang
Li, Yinguo
[J]. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2015, : 1 - 14
[43] Speech Activity Detection Using Deep Neural Networks
Shahsavari, Sajad
Sameti, Hossein
Hadian, Hossein
[J]. 2017 25TH IRANIAN CONFERENCE ON ELECTRICAL ENGINEERING (ICEE), 2017, : 1564 - 1568
[44] Noisy training for deep neural networks in speech recognition
Shi Yin
Chao Liu
Zhiyong Zhang
Yiye Lin
Dong Wang
Javier Tejedor
Thomas Fang Zheng
Yinguo Li
[J]. EURASIP Journal on Audio, Speech, and Music Processing, 2015
[45] Deep Neural Networks for Acoustic Modeling in Speech Recognition
Hinton, Geoffrey
Deng, Li
Yu, Dong
Dahl, George E.
Mohamed, Abdel-rahman
Jaitly, Navdeep
Senior, Andrew
Vanhoucke, Vincent
Patrick Nguyen
Sainath, Tara N.
Kingsbury, Brian
[J]. IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) : 82 - 97
[46] A NETWORK OF DEEP NEURAL NETWORKS FOR DISTANT SPEECH RECOGNITION
Ravanelli, Mirco
Brakel, Philemon
Omologo, Maurizio
Bengio, Yoshua
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 4880 - 4884
[47] Survey on Deep Neural Networks in Speech and Vision Systems
Alam, M.
Samad, M. D.
Vidyaratne, L.
Glandon, A.
Iftekharuddin, K. M.
[J]. NEUROCOMPUTING, 2020, 417 : 302 - 321
[48] Mongolian Speech Recognition Based on Deep Neural Networks
Zhang, Hui
Bao, Feilong
Gao, Guanglai
[J]. CHINESE COMPUTATIONAL LINGUISTICS AND NATURAL LANGUAGE PROCESSING BASED ON NATURALLY ANNOTATED BIG DATA (CCL 2015), 2015, 9427 : 180 - 188
[49] Speech bandwidth expansion based on Deep Neural Networks
Wang, Yingxue
Zhao, Shenghui
Liu, Wenbo
Li, Ming
Kuang, Jingming
[J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2593 - 2597
[50] Emotional Speech Recognition Using Deep Neural Networks
Trinh Van, Loan
Dao Thi Le, Thuy
Le Xuan, Thanh
Castelli, Eric
[J]. SENSORS, 2022, 22 (04)

← 1 2 3 4 5 →