Speech fragment decoding techniques for simultaneous speaker identification and speech recognition

被引:24
|
作者
Barker, Jon [1 ]
Ma, Ning [1 ]
Coy, Andre [1 ]
Cooke, Martin [2 ]
机构
[1] Univ Sheffield, Dept Comp Sci, Sheffield S1 4DP, S Yorkshire, England
[2] Univ Basque Country, Fac Ciencias & Tecnol, Dept Elect & Elect, Leioa 48940, Spain
来源
COMPUTER SPEECH AND LANGUAGE | 2010年 / 24卷 / 01期
基金
英国工程与自然科学研究理事会;
关键词
Speech recognition; Speech separation; Speaker identification; Simultaneous speech; Auditory scene analysis; Noise robustness; CONCURRENT VOWELS; PERCEPTION; MODEL;
D O I
10.1016/j.csl.2008.05.003
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper addresses the problem of recognising speech in the presence of a competing speaker. We review a speech fragment decoding technique that treats segregation and recognition as coupled problems. Data-driven techniques are used to segment a spectro-temporal representation into a set of fragments, such that each fragment is dominated by one or other of the speech sources. A speech fragment decoder is used which employs missing data techniques and clean speech models to simultaneously search for the set of fragments and the word sequence that best matches the target speaker model. The paper investigates the performance of the system oil a recognition task employing artificially mixed target and masker speech utterances. The fragment decoder produces significantly lower error rates than a conventional recogniser, and mimics the pattern of human performance that is produced by the interplay between energetic and informational masking. However, at around 0 dB the performance is generally quite poor. An analysis of the errors shows that a large number of target/masker confusions are being made. The paper presents a novel fragment-based speaker identification approach that allows the target speaker to be reliably identified across a wide range of SNRs. This component is combined with the recognition system to produce significant improvements. When the target and masker utterance have the same gender, the recognition system has a performance at 0 dB equal to that of humans; in other conditions the error rate is roughly twice the human error rate. (C) 2008 Elsevier Ltd. All rights reserved.
引用
收藏
页码:94 / 111
页数:18
相关论文
共 50 条
  • [1] Recent advances in speech fragment decoding techniques
    Barker, Jon
    Coy, Andre
    Ma, Ning
    Cooke, Martin
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 85 - 88
  • [2] SPEAKER IDENTIFICATION AND MESSAGE IDENTIFICATION IN SPEECH RECOGNITION
    GARVIN, PL
    LADEFOGED, P
    [J]. PHONETICA, 1963, 9 (04) : 193 - 199
  • [3] Search in speech, language identification and speaker recognition in Speech@FIT
    Cernocky, Jan
    Burget, Lukas
    Schwarz, Petr
    Matejka, Pavel
    Karafiat, Martin
    Glembek, Ondrej
    Kopecky, Jiri
    Szoeke, Igor
    Fapso, Michal
    Grezl, Frantisek
    Hubeika, Valiantsina
    Oparin, Ilya
    [J]. 2007 17TH INTERNATIONAL CONFERENCE RADIOELEKTRONIKA, VOLS 1 AND 2, 2007, : 132 - +
  • [4] Continuous Speech Recognition and Identification of the Speaker System
    Guffanti, Diego
    Martinez, Danilo
    Paladines, Jose
    Sarmiento, Andrea
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY & SYSTEMS (ICITS 2018), 2018, 721 : 767 - 776
  • [5] Joint Speaker Counting, Speech Recognition, and Speaker Identification for Overlapped Speech of Any Number of Speakers
    Kanda, Naoyuki
    Gaur, Yashesh
    Wang, Xiaofei
    Meng, Zhong
    Chen, Zhuo
    Zhou, Tianyan
    Yoshioka, Takuya
    [J]. INTERSPEECH 2020, 2020, : 36 - 40
  • [6] AUTOMATIC SPEAKER AUTHENTICATION USING SPEECH RECOGNITION TECHNIQUES
    MEEKER, WF
    MARTIN, TB
    HERSCHER, MB
    PHYFE, D
    WEINSTOCK, M
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1967, 42 (05): : 1182 - &
  • [7] Overview of speech enhancement techniques for automatic speaker recognition
    OrtegaGarcia, J
    GonzalezRodriguez, J
    [J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 929 - 932
  • [8] Speech Fragment Decoding Techniques Using Silent Pause Detection
    Yang, Zhanlei
    Liu, Wenju
    Jiang, Wei
    Hu, Pengfei
    Chen, Mingming
    [J]. PATTERN RECOGNITION, 2012, 321 : 579 - 588
  • [9] Bayesian networks in multimodal speech recognition and speaker identification
    Nefian, AV
    Liang, LH
    [J]. CONFERENCE RECORD OF THE THIRTY-SEVENTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, VOLS 1 AND 2, 2003, : 2004 - 2008
  • [10] Speaker identification and speech recognition using phased arrays
    Xu, Roger
    Mei, Gang
    Ren, ZuBing
    Kwan, Chiman
    Aube, Julien
    Rochet, Cedrick
    Stanford, Vincent
    [J]. AMBIENT INTELLIGENCE IN EVERDAY LIFE, 2006, 3864 : 227 - 238