On the Use of Absolute Threshold of Hearing-based Loss for Full-band Speech Enhancement

被引：0

作者：

Mars, Rohith ^{[1
]}

Das, Rohan Kumar ^{[1
]}

机构：

[1] Fortemedia Singapore, Singapore, Singapore

来源：

2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP) | 2022年

关键词：

speech enhancement; deep neural networks; absolute threshold of hearing; NOISE; SUPPRESSION;

D O I：

10.1109/ISCSLP57327.2022.10038050

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we investigate the use of a perceptually motivated loss function for training single-channel full-band speech enhancement models. Specifically, we modify the conventional squared error loss function by incorporating the use of a frequency-importance based weighting scheme utilizing absolute threshold of hearing (ATH). We placed more emphasis on the perceptually relevant frequency bins of the speech spectrogram by applying larger weights to train the speech enhancement model targeting for a higher perceptual quality. We compare the models trained using both the conventional loss and the loss utilizing the proposed ATH-based weighting scheme on the VCTK and 4th DNS challenge datasets. The results demonstrate that the proposed loss using ATH-based weighting scheme achieves better performance than the conventional loss in terms of multiple objective speech quality metrics.

引用

页码：458 / 462

页数：5

共 50 条

[41] DIRECTIONALITY-BASED SPEECH ENHANCEMENT FOR HEARING AIDS
Woodruff, John
Wang, DeLiang
[J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 297 - 300
[42] Speech enhancement based on hearing masking properties and subspace
Ding, Q
Xu, W
Xu, JF
Wang, BX
[J]. 2004 7TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, VOLS 1-3, 2004, : 307 - 310
[43] NEGF based transport modelling with a full-band, pseudopotential Hamiltonian: Theory, Implementation and Full Device Simulations
Pala, Marco G.
Badami, Oves
Esseni, David
[J]. 2017 IEEE INTERNATIONAL ELECTRON DEVICES MEETING (IEDM), 2017,
[44] ON THE USE OF CONTEXTUAL TIME-FREQUENCY INFORMATION FOR FULL-BAND CLUSTERING-BASED CONVOLUTIVE BLIND SOURCE SEPARATION
Atcheson, Matt
Jafari, Ingrid
Togneri, Roberto
Nordholm, Sven
[J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[45] Research Note Threshold Estimation and Speech Perception Under Hearing Loss Simulation: Examination of the Immersive Hearing Loss and Prosthesis Simulator
Roman, Aaron M.
Pratt, Sheila R.
Zhen, Leslie Q.
[J]. AMERICAN JOURNAL OF AUDIOLOGY, 2024, 33 (01) : 275 - 282
[46] ABSOLUTE THRESHOLD, NOISE-INDUCED HEARING-LOSS, AND COCHLEAR PATHOLOGY IN CHINCHILLA
CLARK, WW
CLARK, CS
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1974, 55 (02): : 460 - 460
[47] SPEECH RECOGNITION THRESHOLD IN NOISE - EFFECTS OF HEARING-LOSS, FREQUENCY-RESPONSE, AND SPEECH MATERIALS
VANTASELL, DJ
YANZ, JL
[J]. JOURNAL OF SPEECH AND HEARING RESEARCH, 1987, 30 (03): : 377 - 386
[48] Low-dimensional representation of spectral envelope without deterioration for full-band speech analysis/synthesis system
Morise, Masanori
Miyashita, Genta
Ozawa, Kenji
[J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 409 - 413
[49] Extraction of Potential Oil Fingermarks on Paper Based on Full-Band CCD Photographic System
Xie Fei
Gao Shuhui
Li Yunzhuo
[J]. LASER & OPTOELECTRONICS PROGRESS, 2022, 59 (21)
[50] High Performance Full-band Tunable Product Evolution based on the DSDBR Laser Platform
Mo, Jinyu
Mayner, S.
Nelson, L.
Bu, Q.
[J]. AOE 2008: ASIA OPTICAL FIBER COMMUNICATION AND OPTOELECTRONIC EXPOSITION AND CONFERENCE, 2009,

← 1 2 3 4 5 →