Attention-Based Speech Enhancement Using Human Quality Perception Modeling

被引:0
|
作者
Nayem, Khandokar Md. [1 ]
Williamson, Donald S. [1 ,2 ]
机构
[1] Indiana University, Department of Computer Science, Bloomington,IN,47408, United States
[2] Ohio State University, Department of Computer Science and Engineering, Columbus,OH,43210, United States
关键词
Speech enhancement;
D O I
10.1109/TASLP.2023.3328282
中图分类号
学科分类号
摘要
Perceptually-inspired objective functions such as the perceptual evaluation of speech quality (PESQ), signal-to-distortion ratio (SDR), and short-time objective intelligibility (STOI), have recently been used to optimize performance of deep-learning-based speech enhancement algorithms. These objective functions, however, do not always strongly correlate with a listener's assessment of perceptual quality, so optimizing with these measures often results in poorer performance in real-world scenarios. In this work, we propose an attention-based enhancement approach that uses learned speech embedding vectors from a mean-opinion score (MOS) prediction model and a speech enhancement module to jointly enhance noisy speech. The MOS prediction model estimates the perceptual MOS of speech quality, as assessed by human listeners, directly from the audio signal. The enhancement module also employs a quantized language model that enforces spectral constraints for better speech realism and performance. We train the model using real-world noisy speech data that has been captured in everyday environments and test it using unseen corpora. The results show that our proposed approach significantly outperforms other approaches that are optimized with objective measures, where the predicted quality scores strongly correlate with human judgments. © 2014 IEEE.
引用
收藏
页码:250 / 260
相关论文
共 50 条
  • [31] Sentence Level Human Translation Quality Estimation with Attention-based Neural Networks
    Yuan, Yu
    Sharoff, Serge
    [J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 1858 - 1865
  • [32] Modeling human–human interaction with attention-based high-order GCN for trajectory prediction
    Yanyan Fang
    Zhiyu Jin
    Zhenhua Cui
    Qiaowen Yang
    Tianyi Xie
    Bo Hu
    [J]. The Visual Computer, 2022, 38 : 2257 - 2269
  • [33] Effective Exploitation of Posterior Information for Attention-Based Speech Recognition
    Tang, Jian
    Hou, Junfeng
    Song, Yan
    Dai, Li-Rong
    McLoughlin, Ian
    [J]. IEEE ACCESS, 2020, 8 : 108988 - 108999
  • [34] ATTENTION-BASED MULTI-HYPOTHESIS FUSION FOR SPEECH SUMMARIZATION
    Kano, Takatomo
    Ogawa, Atsunori
    Delcroix, Marc
    Watanabe, Shinji
    [J]. 2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 487 - 494
  • [35] Human Attention-Based Regions of Interest Extraction Using Computational Intelligence
    Al-Azawi, Mohammad
    Yang, Yingjie
    Istance, Howell
    [J]. 2015 IEEE 8TH GCC CONFERENCE AND EXHIBITION (GCCCE), 2015,
  • [36] Attention-based Contextual Language Model Adaptation for Speech Recognition
    Martinez, Richard Diehl
    Novotney, Scott
    Bulyko, Ivan
    Rastrow, Ariya
    Stolcke, Andreas
    Gandhe, Ankur
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 1994 - 2003
  • [37] Modeling Emotion Influence Using Attention-based Graph Convolutional Recurrent Network
    Chen, Yulan
    Wu, Zhiyong
    Jia, Jia
    [J]. ICMI'19: PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2019, : 302 - 309
  • [38] A selective attention-based contextual perception approach for a humanoid robot
    Jiang Y.
    Xiao N.
    [J]. Journal of Control Theory and Applications, 2007, 5 (3): : 244 - 252
  • [39] A Cross-channel Attention-based Wave-U-Net for Multi-channel Speech Enhancement
    Ho, Minh Tri
    Lee, Jinyoung
    Lee, Bong-Ki
    Yi, Dong Hoon
    Kang, Hong-Goo
    [J]. INTERSPEECH 2020, 2020, : 4049 - 4053
  • [40] Attention-based Neural Network for Driving Environment Complexity Perception
    Zhang, Ce
    Eskandarian, Azim
    Du, Xuelai
    [J]. 2021 IEEE INTELLIGENT TRANSPORTATION SYSTEMS CONFERENCE (ITSC), 2021, : 2781 - 2787