Ginisupport vector machines for segmental minimum Bayes risk decoding of continuous speech

被引:5
|
作者
Venkataramani, Veera
Chakrabartty, Shantanu
Byrne, William
机构
[1] Univ Cambridge, Dept Engn, Cambridge CB2 1PZ, England
[2] Fair Isaac Corp, San Diego, CA 92130 USA
[3] Michigan State Univ, Dept Elect & Comp Engn, E Lansing, MI 48824 USA
来源
COMPUTER SPEECH AND LANGUAGE | 2007年 / 21卷 / 03期
关键词
D O I
10.1016/j.csl.2006.08.002
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We describe the use of support vector machines (SVMs) for continuous speech recognition by incorporating them in segmental minimum Bayes risk decoding. Lattice cutting is used to convert the Automatic Speech Recognition search space into sequences of smaller recognition problems. SVMs are then trained as discriminative models over each of these problems and used in a rescoring framework. We pose the estimation of a posterior distribution over hypotheses in these regions of acoustic confusion as a logistic regression problem. We also show that GiniSVMs can be used as an approximation technique to estimate the parameters of the logistic regression problem. On a small vocabulary recognition task we show that the use of GiniSVMs can improve the performance of a well trained hidden Markov model system trained under the Maximum Mutual Information criterion. We also find that it is possible to derive reliable confidence scores over the GiniSVM hypotheses and that these can be used to good effect in hypothesis combination. We discuss the problems that we expect to encounter in extending this approach to large vocabulary continuous speech recognition and describe initial investigation of constrained estimation techniques to derive feature spaces for SVMs. (c) 2006 Elsevier Ltd. All rights reserved.
引用
收藏
页码:423 / 442
页数:20
相关论文
共 27 条
  • [21] Lattice segmentation and support vector machines for large vocabulary continuous speech recognition
    Venkataramani, V
    Byrne, W
    [J]. 2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 817 - 820
  • [22] MINIMUM BAYES RISK SIGNAL DETECTION FOR SPEECH ENHANCEMENT BASED ON A NARROWBAND DOA MODEL
    Taseska, Maja
    Habets, Emanuel A. P.
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 539 - 543
  • [23] High Quality Rather than High Model Probability: Minimum Bayes Risk Decoding with Neural Metrics
    Freitag, Markus
    Grangier, David
    Tan, Qijun
    Liang, Bowen
    [J]. TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2022, 10 : 811 - 825
  • [24] Minimum Bayes Risk Training of RNN-Transducer for End-to-End Speech Recognition
    Weng, Chao
    Yu, Chengzhu
    Cui, Jia
    Zhang, Chunlei
    Yu, Dong
    [J]. INTERSPEECH 2020, 2020, : 966 - 970
  • [25] Discrimination of speech and monophonic singing in continuous audio streams applying multi-layer support vector machines
    Schuller, B
    Rigoll, G
    Lang, M
    [J]. 2004 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXP (ICME), VOLS 1-3, 2004, : 1655 - 1658
  • [26] Spotting consonant-vowel units in continuous speech using autoassociative neural networks and support vector machines
    Gangashetty, SV
    Sekhar, CC
    Yegnanarayana, B
    [J]. MACHINE LEARNING FOR SIGNAL PROCESSING XIV, 2004, : 401 - 410
  • [27] A hybrid system based on hidden Markov models and support vector machines with forward learning for phone recognition in venezuelan continuous speech
    Jabbour, Georges
    Maldonado, Luciano
    Sarmiento, Maria
    [J]. INGENIERIA UC, 2011, 18 (03): : 7 - 16