Blind source extraction for robust speech recognition in multisource noisy environments

被引:15
|
作者
Nesta, Francesco [1 ]
Matassoni, Marco [1 ]
机构
[1] Fdn Bruno Kessler CIT Irst, I-38123 Trento, Italy
来源
COMPUTER SPEECH AND LANGUAGE | 2013年 / 27卷 / 03期
关键词
SEPARATION;
D O I
10.1016/j.csl.2012.08.001
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes and describes a complete system for Blind Source Extraction (BSE). The goal is to extract a target signal source in order to recognize spoken commands uttered in reverberant and noisy environments, and acquired by a microphone array. The architecture of the BSE system is based on multiple stages: (a) TDOA estimation, (b) mixing system identification for the target source, (c) on-line semi-blind source separation and (d) source extraction. All the stages are effectively combined, allowing the estimation of the target signal with limited distortion. While a generalization of the BSE framework is described, here the proposed system is evaluated on the data provided for the CHiME Pascal 2011 competition, i.e. binaural recordings made in a real-world domestic environment. The CHiME mixtures are processed with the BSE and the recovered target signal is fed to a recognizer, which uses noise robust features based on Gammatone Frequency Cepstral Coefficients. Moreover, acoustic model adaptation is applied to further reduce the mismatch between training and testing data and improve the overall performance. A detailed comparison between different models and algorithmic settings is reported, showing that the approach is promising and the resulting system gives a significant reduction of the error rate. (c) 2012 Elsevier Ltd. All rights reserved.
引用
收藏
页码:703 / 725
页数:23
相关论文
共 50 条
  • [31] Robust Recognition of English Speech in Noisy Environments Using Frequency Warped Signal Processing
    Upadhyay, Navneet
    Gamboa Rosales, Hamurabi
    NATIONAL ACADEMY SCIENCE LETTERS-INDIA, 2018, 41 (01): : 15 - 22
  • [32] Robust Recognition of English Speech in Noisy Environments Using Frequency Warped Signal Processing
    Navneet Upadhyay
    Hamurabi Gamboa Rosales
    National Academy Science Letters, 2018, 41 : 15 - 22
  • [33] AMPLITUDE MODULATION SPECTROGRAM BASED FEATURES FOR ROBUST SPEECH RECOGNITION IN NOISY AND REVERBERANT ENVIRONMENTS
    Moritz, Niko
    Anemueller, Joern
    Kollmeier, Birger
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5492 - 5495
  • [34] Modulation domain blind speech separation in noisy environments
    Zhang, Yi
    Zhao, Yunxin
    SPEECH COMMUNICATION, 2013, 55 (10) : 1081 - 1099
  • [35] Special issue on speech separation and recognition in multisource environments Preface
    Barker, Jon
    Vincent, Emmanuel
    COMPUTER SPEECH AND LANGUAGE, 2013, 27 (03): : 619 - 620
  • [36] Robust speech recognition in a high interference real room environment using Blind Speech Extraction
    Koutras, A
    Dermatas, E
    DSP 2002: 14TH INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING PROCEEDINGS, VOLS 1 AND 2, 2002, : 167 - 171
  • [37] Robust speech recognition in car environments
    Shozakai, M
    Nakamura, S
    Shikano, K
    PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 269 - 272
  • [38] Robust emotional speech recognition based on binaural model and emotional auditory mask in noisy environments
    Bashirpour, Meysam
    Geravanchizadeh, Masoud
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2018,
  • [39] Robust emotional speech recognition based on binaural model and emotional auditory mask in noisy environments
    Meysam Bashirpour
    Masoud Geravanchizadeh
    EURASIP Journal on Audio, Speech, and Music Processing, 2018
  • [40] Speech recognition in noisy environments with Convolutional Neural Networks
    Santos, Rafael M.
    Matos, Leonardo N.
    Macedo, Hendrik T.
    Montalvao, Jugurta
    2015 BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS 2015), 2015, : 175 - 179