Large vocabulary speech recognition of Slovenian language using morphological models

被引:0
|
作者
Maucec, M [1 ]
Rotovnik, T [1 ]
Kacic, Z [1 ]
Horvat, B [1 ]
机构
[1] Univ Maribor, Inst Elect, Fac Elect Engn & Comp Sci, SI-2000 Maribor, Slovenia
关键词
language modelling; automatic continuous speech recognition; morphology; large vocabulary; data-driven methods; topic adaptation;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper concerns the development of automatic speech recognition system for Slovenian language. The large number of unique words in inflected languages is identified as the primary reason for performance degradation. This article discusses the statistical language models. A novel variation of the n-gram modelling theme is examined. Modelling units are chosen to be stems and endings instead of words. Only data-driven algorithms are employed to decompose words into stems and endings automatically. Significant reduction of OOV rate results when using stems and endings for modelling the Slovenian language. We as well discuss corpus-based topic-adapted language models. Language models are most often used in topic homogeneous environment. The problem of topic detection in highly inflected language is outlined, caused by appearance of several word forms derived from the same lemma. The problem is solved by using data-driven algorithms to group words of the same lemma into classes.
引用
收藏
页码:158 / 161
页数:4
相关论文
共 50 条
  • [11] Boosting acoustic models in large vocabulary speech recognition
    Meyer, C
    Schramm, H
    [J]. PROCEEDINGS OF THE SIXTH IASTED INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING, 2004, : 255 - 260
  • [12] Language identification through large vocabulary continous speech recognition
    Lim, BP
    Li, HZ
    Chen, Y
    [J]. 2004 INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2004, : 49 - 52
  • [13] A large vocabulary continuous speech recognition system for Persian language
    Sameti, Hossein
    Veisi, Hadi
    Bahrani, Mohammad
    Babaali, Bagher
    Hosseinzadeh, Khosro
    [J]. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2011, : 1 - 12
  • [14] A large vocabulary continuous speech recognition system for Persian language
    Hossein Sameti
    Hadi Veisi
    Mohammad Bahrani
    Bagher Babaali
    Khosro Hosseinzadeh
    [J]. EURASIP Journal on Audio, Speech, and Music Processing, 2011
  • [15] A multispan language modeling framework for large vocabulary speech recognition
    Bellegarda, JR
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1998, 6 (05): : 456 - 467
  • [16] LARGE-VOCABULARY SPEECH RECOGNITION - A SYSTEM FOR THE ITALIAN LANGUAGE
    DORTA, P
    FERRETTI, M
    MARTELLI, A
    MELECRINIS, S
    SCARCI, S
    VOLPI, G
    [J]. IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 1988, 32 (02) : 217 - 226
  • [17] Connectionist language modeling for large vocabulary continuous speech recognition
    Schwenk, H
    Gauvain, JL
    [J]. 2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 765 - 768
  • [18] Large vocabulary Russian speech recognition using syntactico-statistical language modeling
    Karpov, Alexey
    Markov, Konstantin
    Kipyatkova, Irina
    Vazhenina, Dania
    Ronzhin, Andrey
    [J]. SPEECH COMMUNICATION, 2014, 56 : 213 - 228
  • [19] Training a language model using webdata for large vocabulary Japanese spontaneous speech recognition
    Masumura, Ryo
    Hahm, Seongjun
    Ito, Akinori
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 1476 - 1479
  • [20] Building DNN acoustic models for large vocabulary speech recognition
    Maas, Andrew L.
    Qi, Peng
    Xie, Ziang
    Hannun, Awni Y.
    Lengerich, Christopher T.
    Jurafsky, Daniel
    Ng, Andrew Y.
    [J]. COMPUTER SPEECH AND LANGUAGE, 2017, 41 : 195 - 213