Towards improving the robustness of distributed speech recognition in packet loss

被引:4
|
作者
James, Alastair [1 ]
Milner, Ben [1 ]
机构
[1] Univ E Anglia, Sch Comp Sci, Norwich NR4 7TJ, Norfolk, England
基金
英国工程与自然科学研究理事会;
关键词
distributed speech recognition; packet loss; interleaving; MAP reconstruction; weighted-Viterbi decoding;
D O I
10.1016/j.specom.2006.07.005
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This work addresses the problem of achieving robust distributed speech recognition (DSR) performance in the presence of packet loss. The nature of packet loss is analysed by examining packet loss data gathered from a GSM mobile data channel. This analysis is then used to examine the effect of realistic packet loss conditions on DSR systems, and shows that the accuracy of DSR is more sensitive to burst-like packet loss rather than the actual number of lost packets. This leads to the design of a three-stage packet loss compensation scheme. First, interleaving is applied to the transmitted feature vectors to disperse bursts of packet loss. Second, lost feature vectors are reconstructed prior to recognition using a variety of reconstruction techniques. Third, a weighted-Viterbi decoding method is applied to the recogniser itself, which modifies the contribution of the reconstructed feature vectors according to the accuracy of their reconstruction. Experimental results on both a connected digits task and a large-vocabulary task show that simple methods, such as repetition, are not as effective as interpolation methods. Best performance is given by a novel maximum a posteriori (MAP) estimation, which utilizes temporal statistics of the feature vector stream. This reconstruction method is then combined with weighted-Viterbi decoding, using a novel method to calculate the confidences of reconstructed static and temporal components separately. Using interleaving, results improve significantly, and it is shown that a limited level of interleaving can be applied without increasing the delay to the end-user. Using a combination of these techniques for the connected digits task, word accuracy is increased from 49.5% to 95.3% even with a packet loss rate of 50% and average burst length of 20 feature vectors. (c) 2006 Elsevier B.V. All rights reserved.
引用
收藏
页码:1402 / 1421
页数:20
相关论文
共 50 条
  • [1] Towards improving speech detection robustness for speech recognition in adverse conditions
    Karray, L
    Martin, A
    [J]. SPEECH COMMUNICATION, 2003, 40 (03) : 261 - 276
  • [2] Partial splicing packet loss concealment for distributed speech recognition
    Tan, ZH
    Dalsgaard, P
    Lindberg, B
    [J]. ELECTRONICS LETTERS, 2003, 39 (22) : 1619 - 1620
  • [3] Robust distributed speech recognition in noise and packet loss conditions
    Flynn, Ronan
    Jones, Edward
    [J]. DIGITAL SIGNAL PROCESSING, 2010, 20 (06) : 1559 - 1571
  • [4] Reducing bandwidth for robust distributed speech recognition in conditions of packet loss
    Flynn, Ronan
    Jones, Edward
    [J]. SPEECH COMMUNICATION, 2012, 54 (07) : 881 - 892
  • [5] Combinational methods for improvement of packet-loss recovery in distributed speech recognition
    Abdolazimi, A.
    Mohammadi, M.
    Nasersharif, B.
    Akbari, A.
    Mazoochi, M.
    [J]. 2008 3RD INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS, CONTROL AND SIGNAL PROCESSING, VOLS 1-3, 2008, : 825 - +
  • [6] Soft decoding of temporal derivatives for robust distributed speech recognition in packet loss
    James, A
    Milner, B
    [J]. 2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 345 - 348
  • [7] Improving speech detection robustness for wireless speech recognition
    Karray, L
    Mauuary, L
    [J]. 1997 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, PROCEEDINGS, 1997, : 428 - 435
  • [8] Improving Robustness to Compressed Speech in Speaker Recognition
    McLaren, Mitchell
    Abrash, Victor
    Graciarena, Martin
    Lei, Yun
    Pesan, Jan
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 3665 - 3669
  • [9] A robust scheme for distributed speech recognition over loss-prone packet channels
    Gomez, Angel M.
    Peinado, Antonio M.
    Sanchez, Victoria
    Carmona, Jose L.
    [J]. SPEECH COMMUNICATION, 2009, 51 (04) : 390 - 400
  • [10] Packet loss concealment based on VQ replicas and MMSE estimation applied to distributed speech recognition
    Peinado, AM
    Gómez, AM
    Sánchez, V
    Pérez-Córdoba, JL
    Rubio, AJ
    [J]. 2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 329 - 332