Syllable Based Language Model for Large Vocabulary Continuous Speech Recognition of Polish

被引:0
|
作者
Majewski, Piotr [1 ]
机构
[1] Univ Lodz, Fac Math & Comp Sci, PL-90238 Lodz, Poland
来源
关键词
Polish; large vocabulary continuous speech recognition; language modeling; sub-word units; syllable-based units;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Most of state-of-the-art large vocabulary continuous speech recognition systems use word-based n-gram language models. Such models are not optimal solution for inflectional or agglutinative languages. The Polish language is highly inflectional one and requires a very large corpora to create a sufficient language model with the small out-of-vocabulary ratio. We propose a syllable-based language model. which is better suited to highly inflectional language like Polish. In case of lack of resources (i.e. small corpora) syllable-based model outperforms word-based models in terms of number of out-of-vocabulary units (syllables in our model). Such model is an approximation of the morphene-based model for Polish. In our paper, we show results of evaluation of syllable based model and its usefulness in speech recognition tasks.
引用
收藏
页码:397 / 401
页数:5
相关论文
共 50 条
  • [1] Syllable based language model for large vocabulary continuous speech recognition of Uyghur
    [J]. Silamu, W. (wushour@xju.edu.cn), 1600, Tsinghua University (53):
  • [2] Syllable-based large vocabulary continuous speech recognition
    Ganapathiraju, A
    Hamaker, J
    Picone, J
    Ordowski, M
    Doddington, GR
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2001, 9 (04): : 358 - 366
  • [3] Development of Large Vocabulary Continuous Speech Recognition for Polish
    Demenko, G.
    Szymanski, M.
    Cecko, R.
    Kusmierek, E.
    Lange, M.
    Wegner, K.
    Klessa, K.
    Owsianny, M.
    [J]. ACTA PHYSICA POLONICA A, 2012, 121 (1A) : A86 - A91
  • [4] A unified language model for large vocabulary continuous speech recognition of Turkish
    Arisoy, Ebru
    Dutagaci, Helin
    Arslan, Levent M.
    [J]. SIGNAL PROCESSING, 2006, 86 (10) : 2844 - 2862
  • [5] Continuous Mandarin speech recognition for Chinese language with large vocabulary based on segmental probability model
    Shen, JL
    [J]. IEE PROCEEDINGS-VISION IMAGE AND SIGNAL PROCESSING, 1998, 145 (05): : 309 - 315
  • [6] Connectionist language modeling for large vocabulary continuous speech recognition
    Schwenk, H
    Gauvain, JL
    [J]. 2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 765 - 768
  • [7] A large vocabulary continuous speech recognition system for Persian language
    Hossein Sameti
    Hadi Veisi
    Mohammad Bahrani
    Bagher Babaali
    Khosro Hosseinzadeh
    [J]. EURASIP Journal on Audio, Speech, and Music Processing, 2011
  • [8] A large vocabulary continuous speech recognition system for Persian language
    Sameti, Hossein
    Veisi, Hadi
    Bahrani, Mohammad
    Babaali, Bagher
    Hosseinzadeh, Khosro
    [J]. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2011, : 1 - 12
  • [9] A usage of the syllable unit based on morphological statistics in Korean large vocabulary continuous speech recognition system
    Ri, Hyok-Chol
    [J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2019, 22 (04) : 971 - 977
  • [10] A usage of the syllable unit based on morphological statistics in Korean large vocabulary continuous speech recognition system
    Hyok-Chol Ri
    [J]. International Journal of Speech Technology, 2019, 22 : 971 - 977