On the Use of Absolute Threshold of Hearing-based Loss for Full-band Speech Enhancement

被引:0
|
作者
Mars, Rohith [1 ]
Das, Rohan Kumar [1 ]
机构
[1] Fortemedia Singapore, Singapore, Singapore
关键词
speech enhancement; deep neural networks; absolute threshold of hearing; NOISE; SUPPRESSION;
D O I
10.1109/ISCSLP57327.2022.10038050
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we investigate the use of a perceptually motivated loss function for training single-channel full-band speech enhancement models. Specifically, we modify the conventional squared error loss function by incorporating the use of a frequency-importance based weighting scheme utilizing absolute threshold of hearing (ATH). We placed more emphasis on the perceptually relevant frequency bins of the speech spectrogram by applying larger weights to train the speech enhancement model targeting for a higher perceptual quality. We compare the models trained using both the conventional loss and the loss utilizing the proposed ATH-based weighting scheme on the VCTK and 4th DNS challenge datasets. The results demonstrate that the proposed loss using ATH-based weighting scheme achieves better performance than the conventional loss in terms of multiple objective speech quality metrics.
引用
收藏
页码:458 / 462
页数:5
相关论文
共 50 条
  • [1] Local spectral attention for full-band speech enhancement
    Hou, Zhongshu
    Hu, Qinwen
    Chen, Kai
    Cao, Zhanzhong
    Lu, Jing
    [J]. JASA EXPRESS LETTERS, 2023, 3 (11):
  • [2] Learnable spectral dimension compression mapping for full-band speech enhancement
    Hu, Qinwen
    Hou, Zhongshu
    Chen, Kai
    Lu, Jing
    [J]. JASA EXPRESS LETTERS, 2023, 3 (02):
  • [3] DEEPFILTERNET: A LOW COMPLEXITY SPEECH ENHANCEMENT FRAMEWORK FOR FULL-BAND AUDIO BASED ON DEEP FILTERING
    Schroeter, Hendrik
    Escalante-B, Alberto N.
    Rosenkranz, Tobias
    Maier, Andreas
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7407 - 7411
  • [4] Lightweight Full-band and Sub-band Fusion Network for Real Time Speech Enhancement
    Chen, Zhuangqi
    Zhang, Pingjian
    [J]. INTERSPEECH 2022, 2022, : 921 - 925
  • [5] A Full-Band Adaptive Harmonic Representation of Speech
    Degottex, Gilles
    Stylianou, Yannis
    [J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 382 - 385
  • [6] Optimizing Shoulder to Shoulder: A Coordinated Sub-Band Fusion Model for Full-Band Speech Enhancement
    Yu, Guochen
    Li, Andong
    Liu, Wenzhe
    Zheng, Chengshi
    Wang, Yutian
    Wang, Hui
    [J]. 2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 483 - 487
  • [7] A Hybrid DSP/Deep Learning Approach to Real-Time Full-Band Speech Enhancement
    Valin, Jean-Marc
    [J]. 2018 IEEE 20TH INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2018,
  • [8] DPT-FSNET: DUAL-PATH TRANSFORMER BASED FULL-BAND AND SUB-BAND FUSION NETWORK FOR SPEECH ENHANCEMENT
    Dang, Feng
    Chen, Hangting
    Zhangt, Pengyuan
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6857 - 6861
  • [9] FSI-Net: A dual-stage full- and sub-band integration network for full-band speech enhancement
    Yu, Guochen
    Wang, Hui
    Li, Andong
    Liu, Wenzhe
    Zhang, Yuan
    Wang, Yutian
    Zheng, Chengshi
    [J]. APPLIED ACOUSTICS, 2023, 211
  • [10] DMF-Net: A decoupling-style multi-band fusion model for full-band speech enhancement
    Yu, Guochen
    Guan, Yuansheng
    Meng, Weixin
    Zheng, Chengshi
    Wang, Hui
    Wang, Yutian
    [J]. PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 1382 - 1387