PHONEME-BASED DISTRIBUTION REGULARIZATION FOR SPEECH ENHANCEMENT

被引:0
|
作者
Liu, Yajing [1 ]
Peng, Xiulian [2 ]
Xiong, Zhiwei [1 ]
Lu, Yan [2 ]
机构
[1] Uniyers Sci & Technol China, Hefei, Peoples R China
[2] Microsoft Res Asia, Beijing, Peoples R China
关键词
Speech enhancement; phoneme; SEPARATION; MASKING;
D O I
10.1109/ICASSP39728.2021.9414761
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Existing speech enhancement methods mainly separate speech from noises at the signal level or in the time-frequency domain. They seldom pay attention to the semantic information of a corrupted signal. In this paper, we aim to bridge this gap by extracting phoneme identities to help speech enhancement. Specifically, we propose a phoneme-based distribution regularization (PbDr) for speech enhancement, which incorporates frame-wise phoneme information into speech enhancement network in a conditional manner. As different phonemes always lead to different feature distributions in frequency, we propose to learn a parameter pair, i.e. scale and bias, through a phoneme classification vector to modulate the speech enhancement network. The modulation parameter pair includes not only frame-wise but also frequency-wise conditions, which effectively map features to phoneme-related distributions. In this way, we explicitly regularize speech enhancement features by recognition vectors. Experiments on public datasets demonstrate that the proposed PbDr module can not only boost the perceptual quality for speech enhancement but also the recognition accuracy of an ASR system on the enhanced speech. This PbDr module could be readily incorporated into other speech enhancement networks as well.
引用
收藏
页码:726 / 730
页数:5
相关论文
共 50 条
  • [1] Improved Phoneme-Based Myoelectric Speech Recognition
    Zhou, Quan
    Jiang, Ning
    Englehart, Kevin
    Hudgins, Bernard
    [J]. IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 2009, 56 (08) : 2016 - 2023
  • [2] Myoclectric signal classification for phoneme-based speech recognition
    Scheme, Erik J.
    Hudgins, Bernard
    Parker, Phillip A.
    [J]. IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 2007, 54 (04) : 694 - 699
  • [3] PHONEME-BASED SPEECH CHIP NEEDS LESS MEMORY
    BASSAK, G
    [J]. ELECTRONICS, 1980, 53 (15): : 43 - 44
  • [4] SYNTHESIS OF ARABIC SPEECH USING PHONEME-BASED SYNTHESIZERS
    MANDURAH, MM
    [J]. JOURNAL OF ENGINEERING SCIENCES, 1984, 10 (1-2): : 9 - 14
  • [5] Phoneme-based vector quantization in a discrete HMM speech recognizer
    Zhang, YX
    Togneri, R
    Alder, M
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1997, 5 (01): : 26 - 32
  • [6] A PHONEME-BASED PRE-TRAINING APPROACH FOR DEEP NEURAL NETWORK WITH APPLICATION TO SPEECH ENHANCEMENT
    Chazan, Shlomo E.
    Gannot, Sharon
    Goldberger, Jacob
    [J]. 2016 IEEE INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2016,
  • [7] A STOCHASTIC SEGMENT MODEL FOR PHONEME-BASED CONTINUOUS SPEECH RECOGNITION
    OSTENDORF, M
    ROUKOS, S
    [J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1989, 37 (12): : 1857 - 1869
  • [8] Stochastic Filter Approaches for a Phoneme-Based Segmentation of Speech Signals
    Rauh, Andreas
    Tiede, Susann
    Klenke, Cornelia
    [J]. 2016 21ST INTERNATIONAL CONFERENCE ON METHODS AND MODELS IN AUTOMATION AND ROBOTICS (MMAR), 2016, : 732 - 737
  • [9] Complexity of articulation planning in apraxia of speech: The limits of phoneme-based approaches
    Ziegler, Wolfram
    [J]. COGNITIVE NEUROPSYCHOLOGY, 2017, 34 (7-8) : 482 - 487
  • [10] EMPHASIS: An Emotional Phoneme-based Acoustic Model for Speech Synthesis System
    Li, Hao
    Kang, Yongguo
    Wang, Zhenyu
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3077 - 3081