Incorporating Broad Phonetic Information for Speech Enhancement

被引:6
|
作者
Lu, Yen-Ju [1 ]
Liao, Chien-Feng [1 ]
Lu, Xugang [2 ]
Hung, Jeih-weih [3 ]
Tsao, Yu [1 ]
机构
[1] Acad Sinica, Res Ctr Informat Technol Innovat, Taipei, Taiwan
[2] Natl Inst Informat & Commun Technol, Koganei, Tokyo, Japan
[3] Natl Chi Nan Univ, Nantou, Taiwan
来源
关键词
speech enhancement; broad phonetic classes; articulatory attribute; INTELLIGIBILITY;
D O I
10.21437/Interspeech.2020-1400
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
In noisy conditions, knowing speech contents facilitates listeners to more effectively suppress background noise components and to retrieve pure speech signals. Previous studies have also confirmed the benefits of incorporating phonetic information in a speech enhancement (SE) system to achieve better denoising performance. To obtain the phonetic information, we usually prepare a phoneme-based acoustic model, which is trained using speech waveforms and phoneme labels. Despite performing well in normal noisy conditions, when operating in very noisy conditions, however, the recognized phonemes may be erroneous and thus misguide the SE process. To overcome the limitation, this study proposes to incorporate the broad phonetic class (BPC) information into the SE process. We have investigated three criteria to build the BPC, including two knowledge-based criteria: place and manner of articulatory and one data-driven criterion. Moreover, the recognition accuracies of BPCs are much higher than that of phonemes, thus providing more accurate phonetic information to guide the SE process under very noisy conditions. Experimental results demonstrate that the proposed SE with the BPC information framework can achieve notable performance improvements over the baseline system and an SE system using monophonic information in terms of both speech quality intelligibility on the TIMIT dataset.
引用
收藏
页码:2417 / 2421
页数:5
相关论文
共 50 条
  • [1] Improving Speech Enhancement Performance by Leveraging Contextual Broad Phonetic Class Information
    Lu, Yen-Ju
    Chang, Chia-Yu
    Yu, Cheng
    Liu, Ching-Feng
    Hung, Jeih-weih
    Watanabe, Shinji
    Tsao, Yu
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 2738 - 2750
  • [2] A Study of Incorporating Articulatory Movement Information in Speech Enhancement
    Chen, Yu-Wen
    Hung, Kuo-Hsuan
    Chuang, Shang-Yi
    Sherman, Jonathan
    Lu, Xugang
    Tsao, Yu
    [J]. 29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021), 2021, : 496 - 500
  • [3] Segmentation of Continuous Speech for Broad Phonetic Engine
    Deekshitha, G.
    Thennattil, Jubin James
    Mary, Leena
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ELECTRICAL, COMPUTER AND COMMUNICATION TECHNOLOGIES, 2015,
  • [4] Mask estimation incorporating phase-sensitive information for speech enhancement
    Wang, Xianyun
    Bao, Changchun
    [J]. APPLIED ACOUSTICS, 2019, 156 : 101 - 112
  • [5] IMPROVING SPEECH ENHANCEMENT WITH PHONETIC EMBEDDING FEATURES
    Wu, Bo
    Yu, Meng
    Chen, Lianwu
    Jin, Mingjie
    Su, Dan
    Yu, Dong
    [J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 645 - 651
  • [6] PHONETIC FEEDBACK FOR SPEECH ENHANCEMENT WITH AND WITHOUT PARALLEL SPEECH DATA
    Plantinga, Peter
    Bagchi, Deblin
    Fosler-Lussier, Eric
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6679 - 6683
  • [7] Incorporating phonetic properties in hidden Markov models for speech recognition
    Sitaram, RNV
    Sreenivas, T
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1997, 102 (02): : 1149 - 1158
  • [8] Phonetic perspectives on modelling information in the speech signal
    Hawkins, S.
    [J]. SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 2011, 36 (05): : 555 - 586
  • [9] Phonetic perspectives on modelling information in the speech signal
    S HAWKINS
    [J]. Sadhana, 2011, 36 : 555 - 586
  • [10] A Systematic Comparison of Phonetic Aware Techniques for Speech Enhancement
    Tal, Or
    Mandel, Moshe
    Kreuk, Felix
    Adi, Yossi
    [J]. INTERSPEECH 2022, 2022, : 1193 - 1197