A Segmental CRF Approach to Large Vocabulary Continuous Speech Recognition

被引:48
|
作者
Zweig, Geoffrey [1 ]
Nguyen, Patrick [1 ]
机构
[1] Microsoft Res, Redmond, WA USA
关键词
speech recognition; conditional random field; direct modeling; detector features;
D O I
10.1109/ASRU.2009.5372916
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes a segmental conditional random field framework for large vocabulary continuous speech recognition. Fundamental to this approach is the use of acoustic detectors as the basic input, and the automatic construction of a versatile set of segment-level features. The detector streams operate at multiple time scales (frame, phone, multi-phone, syllable or word) and are combined at the word level in the CRF training and decoding processes. A key aspect of our approach is that features are defined at the word level, and are naturally geared to explain long span phenomena such as formant trajectories, duration, and syllable stress patterns. Generalization to unseen words is possible through the use of decomposable consistency features [I], [2], and our framework allows for the joint or separate discriminative training of the acoustic and language models. An initial evaluation of this framework with voice search data from the Bing Mobile (BM) application results in a 2% absolute improvement over an HMM baseline.
引用
收藏
页码:152 / 157
页数:6
相关论文
共 50 条
  • [1] A LAYERED APPROACH FOR DUTCH LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION
    Pelemans, Joris
    Demuynck, Kris
    Wambacq, Patrick
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4421 - 4424
  • [2] Vietnamese Large Vocabulary Continuous Speech Recognition
    Ngoc Thang Vu
    Schultz, Tanja
    [J]. 2009 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION & UNDERSTANDING (ASRU 2009), 2009, : 333 - 338
  • [3] Advances in large vocabulary continuous speech recognition
    Zweig, G
    Picheny, M
    [J]. ADVANCES IN COMPUTERS, VOL. 60: INFORMATION SECURITY, 2004, 60 : 249 - 291
  • [4] Continuous Mandarin speech recognition for Chinese language with large vocabulary based on segmental probability model
    Shen, JL
    [J]. IEE PROCEEDINGS-VISION IMAGE AND SIGNAL PROCESSING, 1998, 145 (05): : 309 - 315
  • [5] Large vocabulary continuous speech recognition of Broadcast News - The Philips/RWTH approach
    Beyerlein, P
    Aubert, X
    Haeb-Umbach, R
    Harris, M
    Klakow, D
    Wendemuth, A
    Molau, S
    Ney, H
    Pitz, M
    Sixtus, A
    [J]. SPEECH COMMUNICATION, 2002, 37 (1-2) : 109 - 131
  • [6] Developments in large vocabulary, continuous speech recognition of German
    AddaDecker, M
    Adda, G
    Lamel, L
    Gauvain, JL
    [J]. 1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 153 - 156
  • [7] Utilizing Lipreading in Large Vocabulary Continuous Speech Recognition
    Palecek, Karel
    [J]. SPEECH AND COMPUTER, SPECOM 2017, 2017, 10458 : 767 - 776
  • [8] The RWTH large vocabulary continuous speech recognition system
    Ney, H
    Welling, L
    Ortmanns, S
    Beulen, K
    Wessel, F
    [J]. PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 853 - 856
  • [9] Combating Reverberation in Large Vocabulary Continuous Speech Recognition
    Mitra, Vikramjit
    Van Hout, Julien
    McLaren, Mitchell
    Wang, Wen
    Graciarena, Martin
    Vergyri, Dimitra
    Franco, Horacio
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2449 - 2453
  • [10] Experimenting with lipreading for large vocabulary continuous speech recognition
    Palecek, Karel
    [J]. JOURNAL ON MULTIMODAL USER INTERFACES, 2018, 12 (04) : 309 - 318