Real-time blind source separation system with applications to distant speech recognition

被引:9
|
作者
Ferreira, Alberto E. A. [1 ]
Alarcao, Diogo [2 ]
机构
[1] Univ Lisbon, Tecn Lisboa, Dept Phys, Ave Rovisco Pais, P-1049001 Lisbon, Portugal
[2] Univ Lisbon, CAPS Tecn Lisboa, Ave Rovisco Pais 1, P-1049001 Lisbon, Portugal
关键词
Blind source separation; Distant speech recognition; DUET;
D O I
10.1016/j.apacoust.2016.06.024
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A real-time BSS system based on DUET was developed and implemented in order to assess its potential as the front-end for a DSR engine. The system uses only two closely-spaced standard omni-directional microphones and a computer soundcard and was developed for low reverberation environments with several human speakers and different noise sources. A novel multi-source real-time audio streaming module was developed, with arbitrary statistics, movement tracking, continuity cues such as position and cross-correlation, a spurious peak classifier stage based on kurtosis, and spectral subtraction post-processing. Two intrinsic error causes for the binaural attenuation and delay estimators were identified, due to FFT spectral leakage and to sibilants, which violate the taken for granted DUET assumptions. A comprehensive study on time windows was done and new window types proposed in order to minimize the DUET assumptions violations. The implemented system correctly identifies the clusters in the binaural estimators' space for the case of a real room with two human speakers, up to distances of 2 m from the two microphones, although for distances greater than 1 m the separation quality quickly degrades. (C) 2016 Elsevier Ltd. All rights reserved.
引用
收藏
页码:170 / 184
页数:15
相关论文
共 50 条
  • [1] Speech Recognition System for Embedded Real-time Applications
    Cheng, Octavian
    Abdulla, Waleed
    Salcic, Zoran
    [J]. 2009 IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY (ISSPIT 2009), 2009, : 118 - 122
  • [2] A simple and comutationally efficient algorithm for real-time blind source separation of speech mixtures
    Ballal, Tarig
    Grbic, Nedelko
    Mohammed, Abbas
    [J]. SIGMAP 2006: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND MULTIMEDIA APPLICATIONS, 2006, : 105 - +
  • [3] Real-time Prototype for Integration of Blind Source Extraction and Robust Automatic Speech Recognition
    Nesta, Francesco
    Matassoni, Marco
    Maganti, HariKrishna
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 3350 - 3351
  • [4] Objective quality evaluation in blind source separation for speech recognition in a real room
    Di Persia, Leandro
    Yanagida, Masuzo
    Rufiner, Hugo Leonardo
    Milone, Diego
    [J]. SIGNAL PROCESSING, 2007, 87 (08) : 1951 - 1965
  • [5] A near real-time approach for convolutive blind source separation
    Ding, S
    Huang, J
    Wei, D
    Cichocki, A
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2006, 53 (01) : 114 - 128
  • [6] REAL-TIME SPEECH RECOGNITION
    CAELEN, J
    CASTAN, S
    PERENNOU, G
    [J]. AUTOMATISME, 1972, 17 (03): : 87 - &
  • [7] Design and Evaluation of a Real-Time Speech Recognition System
    Shruthi, S.
    Yashaswi, G.
    Shruti, V
    Manikandan, J.
    [J]. 2018 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2018, : 425 - 430
  • [8] Real-time convolutive blind source separation based on a broadband approach
    Aichner, R
    Buchner, H
    Fei, Y
    Kellermann, W
    [J]. INDEPENDENT COMPONENT ANALYSIS AND BLIND SIGNAL SEPARATION, 2004, 3195 : 840 - 848
  • [9] Real-Time Independent Vector Analysis for Convolutive Blind Source Separation
    Kim, Taesu
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2010, 57 (07) : 1431 - 1438
  • [10] Implementation of pipelined FastICA on FPGA for real-time blind source separation
    Shyu, Kuo-Kai
    Lee, Ming-Huan
    Wu, Yu-Te
    Lee, Po-Lei
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS, 2008, 19 (06): : 958 - 970