Towards improving the robustness of distributed speech recognition in packet loss

被引:4
|
作者
James, Alastair [1 ]
Milner, Ben [1 ]
机构
[1] Univ E Anglia, Sch Comp Sci, Norwich NR4 7TJ, Norfolk, England
基金
英国工程与自然科学研究理事会;
关键词
distributed speech recognition; packet loss; interleaving; MAP reconstruction; weighted-Viterbi decoding;
D O I
10.1016/j.specom.2006.07.005
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This work addresses the problem of achieving robust distributed speech recognition (DSR) performance in the presence of packet loss. The nature of packet loss is analysed by examining packet loss data gathered from a GSM mobile data channel. This analysis is then used to examine the effect of realistic packet loss conditions on DSR systems, and shows that the accuracy of DSR is more sensitive to burst-like packet loss rather than the actual number of lost packets. This leads to the design of a three-stage packet loss compensation scheme. First, interleaving is applied to the transmitted feature vectors to disperse bursts of packet loss. Second, lost feature vectors are reconstructed prior to recognition using a variety of reconstruction techniques. Third, a weighted-Viterbi decoding method is applied to the recogniser itself, which modifies the contribution of the reconstructed feature vectors according to the accuracy of their reconstruction. Experimental results on both a connected digits task and a large-vocabulary task show that simple methods, such as repetition, are not as effective as interpolation methods. Best performance is given by a novel maximum a posteriori (MAP) estimation, which utilizes temporal statistics of the feature vector stream. This reconstruction method is then combined with weighted-Viterbi decoding, using a novel method to calculate the confidences of reconstructed static and temporal components separately. Using interleaving, results improve significantly, and it is shown that a limited level of interleaving can be applied without increasing the delay to the end-user. Using a combination of these techniques for the connected digits task, word accuracy is increased from 49.5% to 95.3% even with a packet loss rate of 50% and average burst length of 20 feature vectors. (c) 2006 Elsevier B.V. All rights reserved.
引用
收藏
页码:1402 / 1421
页数:20
相关论文
共 50 条
  • [41] Head Fusion: Improving the Accuracy and Robustness of Speech Emotion Recognition on the IEMOCAP and RAVDESS Dataset
    Xu, Mingke
    Zhang, Fan
    Zhang, Wei
    [J]. IEEE ACCESS, 2021, 9 : 74539 - 74549
  • [42] Performance Analysis of Distributed Speech Recognition Using Analysis-by-Synthesis Frame Reduced Front End under Packet Loss Conditions
    Lee, Lee-Min
    Jean, Fu-Rong
    Tan, Tan-Hsu
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC 2015): BIG DATA ANALYTICS FOR HUMAN-CENTRIC SYSTEMS, 2015, : 1992 - 1997
  • [43] IMPROVING ROBUSTNESS OF DEEP NEURAL NETWORKS VIA SPECTRAL MASKING FOR AUTOMATIC SPEECH RECOGNITION
    Li, Bo
    Sim, Khe Chai
    [J]. 2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2013, : 279 - 284
  • [44] ANALYZING THE ROBUSTNESS OF UNSUPERVISED SPEECH RECOGNITION
    Lin, Guan-Ting
    Hsu, Chan-Jan
    Liu, Da-Rong
    Lee, Hung-Yi
    Tsao, Yu
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8202 - 8206
  • [45] Toward noise robustness speech recognition
    Namarvar, HH
    Liaw, J
    Berger, TW
    [J]. 2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING - VOL IV: SIGNAL PROCESSING FOR COMMUNICATIONS; VOL V: SIGNAL PROCESSING EDUCATION SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO & ELECTROACOUSTICS; VOL VI: SIGNAL PROCESSING THEORY & METHODS STUDENT FORUM, 2001, : 4016 - 4016
  • [46] Towards robustness to fast speech in ASR
    Mirghafori, N
    Fosler, E
    Morgan, N
    [J]. 1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 335 - 338
  • [47] MMSE-Based Packet Loss Concealment for CELP-Coded Speech Recognition
    Carmona, Jose L.
    Peinado, Antonio M.
    Perez-Cordoba, Jose L.
    Gomez, Angel M.
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (06): : 1341 - 1353
  • [48] Towards improving automatic speech recognition for underrepresented dialects with data augmentation
    Bakst, Sarah
    Yilmaz, Emre
    Castan, Diego
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2023, 153 (03):
  • [49] On the Ramsey class of interleavers for robust speech recognition in burst-like packet loss
    Gomez, Angel M.
    Peinado, Antonio M.
    Sanchez, Victoria
    Rubio, Antonio J.
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (04): : 1496 - 1499
  • [50] Towards Evaluating the Robustness of Automatic Speech Recognition Systems via Audio Style Transfer
    Jin, Weifei
    Cao, Yuxin
    Su, Junjie
    Shen, Qi
    Ye, Kai
    Wang, Derui
    Hao, Jie
    Liu, Ziyao
    [J]. PROCEEDINGS OF THE 2ND ACM WORKSHOP ON SECURE AND TRUSTWORTHY DEEP LEARNING SYSTEMS, SECTL 2024, 2024, : 47 - 55