Syllable Based Language Model for Large Vocabulary Continuous Speech Recognition of Polish

被引:0
|
作者
Majewski, Piotr [1 ]
机构
[1] Univ Lodz, Fac Math & Comp Sci, PL-90238 Lodz, Poland
来源
关键词
Polish; large vocabulary continuous speech recognition; language modeling; sub-word units; syllable-based units;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Most of state-of-the-art large vocabulary continuous speech recognition systems use word-based n-gram language models. Such models are not optimal solution for inflectional or agglutinative languages. The Polish language is highly inflectional one and requires a very large corpora to create a sufficient language model with the small out-of-vocabulary ratio. We propose a syllable-based language model. which is better suited to highly inflectional language like Polish. In case of lack of resources (i.e. small corpora) syllable-based model outperforms word-based models in terms of number of out-of-vocabulary units (syllables in our model). Such model is an approximation of the morphene-based model for Polish. In our paper, we show results of evaluation of syllable based model and its usefulness in speech recognition tasks.
引用
收藏
页码:397 / 401
页数:5
相关论文
共 50 条
  • [41] Language-model look-ahead for large vocabulary speech recognition
    Ortmanns, S
    Ney, H
    Eiden, A
    [J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 2095 - 2098
  • [42] Specifics of hidden Markov model modifications for large vocabulary continuous speech recognition
    Silingas, D
    Telksnys, L
    [J]. INFORMATICA, 2004, 15 (01) : 93 - 110
  • [43] Deep learning based large vocabulary continuous speech recognition of an under-resourced language Bangladeshi Bangla
    Samin, Ahnaf Mozib
    Kobir, M. Humayon
    Kibria, Shafkat
    Rahman, M. Shahidur
    [J]. ACOUSTICAL SCIENCE AND TECHNOLOGY, 2021, 42 (05) : 252 - 260
  • [44] Towards speech rate independence in large vocabulary continuous speech recognition
    Martinez, F
    Tapias, D
    Alvarez, J
    [J]. PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 725 - 728
  • [45] Large Vocabulary Continuous Speech Recognition With Reservoir-Based Acoustic Models
    Triefenbach, Fabian
    Demuynck, Kris
    Martens, Jean-Pierre
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2014, 21 (03) : 311 - 315
  • [46] Scalable HMM based Inference Engine in Large Vocabulary Continuous Speech Recognition
    Chong, Jike
    You, Kisun
    Yi, Youngmin
    Gonina, Ekaterina
    Hughes, Christopher
    Sung, Wonyong
    Keutzer, Kurt
    [J]. ICME: 2009 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-3, 2009, : 1793 - +
  • [47] Response Probability Based Decoding Algorithm for Large Vocabulary Continuous Speech Recognition
    Yang, Zhanlei
    Chao, Hao
    Liu, Wenju
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 1940 - 1943
  • [48] Extra Large Vocabulary Continuous Speech Recognition Algorithm based on Information Retrieval
    Pylypenko, Valeriy
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1809 - 1812
  • [49] Parallel Scalability in Speech Recognition Inference engines in large vocabulary continuous speech recognition
    You, Kisun
    Chong, Jike
    Yi, Youngmin
    Gonina, Ekaterina
    Hughes, Christopher J.
    Chen, Yen-Kuang
    Sung, Wonyong
    Keutzer, Kurt
    [J]. IEEE SIGNAL PROCESSING MAGAZINE, 2009, 26 (06) : 124 - 135
  • [50] An improved two-stage mixed language model approach for handling out-of-vocabulary words in large vocabulary continuous speech recognition
    Reveil, Bert
    Demuynck, Kris
    Martens, Jean-Pierre
    [J]. COMPUTER SPEECH AND LANGUAGE, 2014, 28 (01): : 141 - 162