On Modeling Non-word Events in Large Vocabulary Continuous Speech Recognition

被引:0
|
作者
Sarosi, G. [1 ]
Tarjan, B. [1 ]
Balog, A. [2 ]
Mozsolics, T. [2 ]
Mihajlik, P. [1 ,2 ]
Fegyo, T. [1 ,3 ]
机构
[1] Budapest Univ Technol & Econ, Budapest, Hungary
[2] THINKTech Res Ctr Nonprofit LLC, Budapest, Hungary
[3] Aitia Int Inc, Budapest, Hungary
关键词
LVCSR; Broadcast News; non-word event recognition; cognitive infocommunication; WFST; CLASSIFICATION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper focuses on the integration of non-word acoustic events into LVCSR (Large Vocabulary Continuous Speech Recognition). Non-word events may have an important role in cognitive, paraverbal infocommunication; however, they often are not modeled explicitly due to computational difficulties. In our experiments a serial and a loopback WFST (Weighted Finite State Transducer) architecture was built to recognize and/or print out certain non-word events on the output. We have used a Hungarian Broadcast News corpus to evaluate the results. No performance degradation was observed in terms of normal word recognition accuracy as compared to the baseline, where no non-word event modeling was applied. The non-word event recognition accuracy was, however, lower than expected. One of the most likely reasons may be the less consistent manual transcription as compared to the normal words. Nonetheless, some of the non-word events were mostly correctly recognized. The loopback architecture has higher memory requirement, but gives significantly better non-word event accuracies, without any increase of recognition time.
引用
收藏
页码:649 / 653
页数:5
相关论文
共 50 条
  • [1] A word graph algorithm for large vocabulary continuous speech recognition
    Ortmanns, S
    Ney, H
    Aubert, X
    COMPUTER SPEECH AND LANGUAGE, 1997, 11 (01): : 43 - 72
  • [2] Multonic Markov Word Models for Large Vocabulary Continuous Speech Recognition
    Bahl, Lalit R.
    Bellegarda, Jerome R.
    de Souza, Peter V.
    Gopalakrishnan, P. S.
    Nahamoo, David
    Picheny, Michael A.
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1993, 1 (03): : 334 - 344
  • [3] Extensions to the word graph method for large vocabulary continuous speech recognition
    Ney, H
    Ortmanns, S
    Lindam, I
    1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 1791 - 1794
  • [4] Connectionist language modeling for large vocabulary continuous speech recognition
    Schwenk, H
    Gauvain, JL
    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 765 - 768
  • [5] Vietnamese Large Vocabulary Continuous Speech Recognition
    Ngoc Thang Vu
    Schultz, Tanja
    2009 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION & UNDERSTANDING (ASRU 2009), 2009, : 333 - 338
  • [6] Advances in large vocabulary continuous speech recognition
    Zweig, G
    Picheny, M
    ADVANCES IN COMPUTERS, VOL. 60: INFORMATION SECURITY, 2004, 60 : 249 - 291
  • [7] Training of across-word phoneme models for large vocabulary continuous speech recognition
    Sixtus, A
    Ney, H
    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 849 - 852
  • [8] ADVANCES IN LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION IN GREEK: MODELING AND NONLINEAR FEATURES
    Rodomagoulakis, Isidoros
    Potamianos, Gerasimos
    Maragos, Petros
    2013 PROCEEDINGS OF THE 21ST EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2013,
  • [9] Modeling word-level rate-of-speech variation in large vocabulary conversational speech recognition
    Zheng, J
    Franco, H
    Stolcke, A
    SPEECH COMMUNICATION, 2003, 41 (2-3) : 273 - 285
  • [10] Integrating a non-probabilistic grammar into large vocabulary continuous speech recognition
    Beutler, R
    Kaufmann, T
    Pfister, B
    2005 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2005, : 104 - 109