Monaural voiced speech segregation based on elaborate harmonic grouping strategies

被引:2
|
作者
Liu WenJu [1 ]
Zhang XueLiang [1 ]
Jiang Wei [1 ]
Li Peng [2 ]
Xu Bo [2 ]
机构
[1] Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China
[2] Chinese Acad Sci, Inst Automat, Digital Media Content Technol Res Ctr, Beijing 100190, Peoples R China
基金
中国国家自然科学基金;
关键词
computational auditory scene analysis; voiced speech separation; harmonistic principle; minimum amplitude principle; elaborate harmonic grouping strategies; BLIND SEPARATION; MODULATION;
D O I
10.1007/s11432-011-4506-2
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, an enhanced algorithm based on several elaborate harmonic grouping strategies for monaural voiced speech segregation is proposed. Main achievements of the proposed algorithm lie in three aspects. Firstly, the algorithm classifies the time-frequency (T-F) units into resolved and unresolved ones by carrier-to-envelope energy ratio, which leads to more accurate classification results than by cross-channel correlation. Secondly, resolved T-F units are grouped together according to minimum amplitude principle, which has been verified to exist in human perception, as well as the harmonic principle. Finally, "enhanced" envelope autocorrelation function is employed to detect amplitude modulation rates, which helps a lot in reducing half-frequency error in grouping of unresolved units. Systematic evaluation and comparison show that performance of separation is greatly improved by the proposed algorithm. Specifically, signal-to-noise ratio (SNR) is improved by 0.96 dB compared with that of previous method. Besides, our algorithm is also effective in improving the PESQ score and subjective perception score.
引用
收藏
页码:2471 / 2480
页数:10
相关论文
共 50 条
  • [1] Monaural voiced speech segregation based on elaborate harmonic grouping strategies
    WenJu Liu
    XueLiang Zhang
    Wei Jiang
    Peng Li
    Bo Xu
    [J]. Science China Information Sciences, 2011, 54 : 2471 - 2480
  • [2] Monaural voiced speech segregation based on elaborate harmonic grouping strategies
    LIU WenJu 1
    2 Digital Media Content Technology Research Center
    [J]. Science China(Information Sciences), 2011, 54 (12) : 2491 - 2500
  • [3] MONAURAL VOICED SPEECH SEGREGATION BASED ON ELABORATE HARMONIC GROUPING STRATEGY
    Zhang, Xueliang
    Liu, Wenju
    Li, Peng
    Xu, Bo
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4661 - +
  • [4] Monaural Voiced Speech Segregation Based on Dynamic Harmonic Function
    Zhang, Xueliang
    Liu, Wenju
    Xu, Bo
    [J]. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2010,
  • [5] Monaural Voiced Speech Segregation Based on Dynamic Harmonic Function
    Xueliang Zhang
    Wenju Liu
    Bo Xu
    [J]. EURASIP Journal on Audio, Speech, and Music Processing, 2010
  • [6] Monaural Voiced Speech Segregation Based on Pitch and Comb Filter
    Zhang, Xueliang
    Liu, Wenju
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 1752 - +
  • [7] Monaural Segregation of Voiced Speech using Discriminative Random Fields
    Prabhavalkar, Rohit
    Jin, Zhaozhang
    Fosler-Lussier, Eric
    [J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 864 - 867
  • [8] Improving Speech Intelligibility in Monaural Segregation System by Fusing Voiced and Unvoiced Speech Segments
    Shoba, S.
    Rajavel, R.
    [J]. CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2019, 38 (08) : 3573 - 3590
  • [9] Improving Speech Intelligibility in Monaural Segregation System by Fusing Voiced and Unvoiced Speech Segments
    S. Shoba
    R. Rajavel
    [J]. Circuits, Systems, and Signal Processing, 2019, 38 : 3573 - 3590
  • [10] Accurate Labeling of Time-Frequency Units in Monaural Voiced Speech Segregation
    Shamlou, Sanam Imani
    Geravanchizadeh, Masoud
    [J]. 2012 SIXTH INTERNATIONAL SYMPOSIUM ON TELECOMMUNICATIONS (IST), 2012, : 902 - 906