Large vocabulary Russian speech recognition using syntactico-statistical language modeling

被引:38
|
作者
Karpov, Alexey [1 ]
Markov, Konstantin [2 ]
Kipyatkova, Irina [1 ]
Vazhenina, Dania [2 ]
Ronzhin, Andrey [1 ]
机构
[1] Russian Acad Sci SPIIRAS, St Petersburg Inst Informat & Automat, St Petersburg, Russia
[2] Univ Aizu, Human Interface Lab, Fukushima, Japan
基金
俄罗斯基础研究基金会;
关键词
Automatic speech recognition; Slavic languages; Russian speech; Language modeling; Syntactical analysis;
D O I
10.1016/j.specom.2013.07.004
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech is the most natural way of human communication and in order to achieve convenient and efficient human computer interaction implementation of state-of-the-art spoken language technology is necessary. Research in this area has been traditionally focused on several main languages, such as English, French, Spanish, Chinese or Japanese, but some other languages, particularly Eastern European languages, have received much less attention. However, recently, research activities on speech technologies for Czech, Polish, Serbo-Croatian, Russian languages have been steadily increasing. In this paper, we describe our efforts to build an automatic speech recognition (ASR) system for the Russian language with a large vocabulary. Russian is a synthetic and highly inflected language with lots of roots and affixes. This greatly reduces the performance of the ASR systems designed using traditional approaches. In our work, we have taken special attention to the specifics of the Russian language when developing the acoustic, lexical and language models. A special software tool for pronunciation lexicon creation was developed. For the acoustic model, we investigated a combination of knowledge-based and statistical approaches to create several different phoneme sets, the best of which was determined experimentally. For the language model (LM), we introduced a new method that combines syntactical and statistical analysis of the training text data in order to build better n-gram models. Evaluation experiments were performed using two different Russian speech databases and an internally collected text corpus. Among the several phoneme sets we created, the one which achieved the fewest word level recognition errors was the set with 47 phonemes and thus we used it in the following language modeling evaluations. Experiments with 204 thousand words vocabulary ASR were performed to compare the standard statistical n-gram LMs and the language models created using our syntactico-statistical method. The results demonstrated that the proposed language modeling approach is capable of reducing the word recognition errors. (C) 2013 Elsevier B.V. All rights reserved.
引用
收藏
页码:213 / 228
页数:16
相关论文
共 50 条
  • [1] Statistical language modeling with semantic classes for large vocabulary speech recognition in embedded systems
    Oria, Daniela
    Olsen, Jesper
    [J]. PROCEEDINGS OF THE SECOND IASTED INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE, 2006, : 496 - +
  • [2] Large vocabulary speech recognition with multispan statistical language models
    Bellegarda, JR
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2000, 8 (01): : 76 - 84
  • [3] Using Morphological Data in Language Modeling for Serbian Large Vocabulary Speech Recognition
    Pakoci, Edvin
    Popovic, Branislav
    Pekar, Darko
    [J]. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2019, 2019
  • [4] A multispan language modeling framework for large vocabulary speech recognition
    Bellegarda, JR
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1998, 6 (05): : 456 - 467
  • [5] Connectionist language modeling for large vocabulary continuous speech recognition
    Schwenk, H
    Gauvain, JL
    [J]. 2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 765 - 768
  • [6] Spoken language identification using large vocabulary speech recognition
    Bell Lab, Murray Hill, United States
    [J]. Int Conf Spoken Lang Process ICSLP Proc, 1600, (1780-1783):
  • [7] Spoken language identification using large vocabulary speech recognition.
    Hieronymus, JL
    Kadambe, S
    [J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1780 - 1783
  • [8] Automatic language identification using large vocabulary continuous speech recognition
    Mendoza, S
    Gillick, L
    Ito, Y
    Lowe, S
    Newmann, M
    [J]. 1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 785 - 788
  • [9] Large vocabulary speech recognition of Slovenian language using morphological models
    Maucec, M
    Rotovnik, T
    Kacic, Z
    Horvat, B
    [J]. IEEE REGION 8 EUROCON 2003, VOL B, PROCEEDINGS: COMPUTER AS A TOOL, 2003, : 158 - 161
  • [10] Subspace Gaussian mixture based language modeling for large vocabulary continuous speech recognition
    Sun, Ri Hyon
    Chol, Ri Jong
    [J]. SPEECH COMMUNICATION, 2020, 117 : 21 - 27