Large vocabulary speech recognition of Slovenian language using morphological models

被引:0
|
作者
Maucec, M [1 ]
Rotovnik, T [1 ]
Kacic, Z [1 ]
Horvat, B [1 ]
机构
[1] Univ Maribor, Inst Elect, Fac Elect Engn & Comp Sci, SI-2000 Maribor, Slovenia
关键词
language modelling; automatic continuous speech recognition; morphology; large vocabulary; data-driven methods; topic adaptation;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper concerns the development of automatic speech recognition system for Slovenian language. The large number of unique words in inflected languages is identified as the primary reason for performance degradation. This article discusses the statistical language models. A novel variation of the n-gram modelling theme is examined. Modelling units are chosen to be stems and endings instead of words. Only data-driven algorithms are employed to decompose words into stems and endings automatically. Significant reduction of OOV rate results when using stems and endings for modelling the Slovenian language. We as well discuss corpus-based topic-adapted language models. Language models are most often used in topic homogeneous environment. The problem of topic detection in highly inflected language is outlined, caused by appearance of several word forms derived from the same lemma. The problem is solved by using data-driven algorithms to group words of the same lemma into classes.
引用
收藏
页码:158 / 161
页数:4
相关论文
共 50 条
  • [1] Using Morphological Data in Language Modeling for Serbian Large Vocabulary Speech Recognition
    Pakoci, Edvin
    Popovic, Branislav
    Pekar, Darko
    [J]. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2019, 2019
  • [2] Large vocabulary speech recognition with multispan statistical language models
    Bellegarda, JR
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2000, 8 (01): : 76 - 84
  • [3] Slovenian large vocabulary speech recognition with data-driven models of inflectional morphology
    Rotovnik, T
    Maucec, MS
    Horvat, B
    Kacic, Z
    [J]. ASRU'03: 2003 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING ASRU '03, 2003, : 83 - 88
  • [4] Hybrid language models for out of vocabulary word detection in large vocabulary conversational speech recognition
    Yazgan, A
    Saraclar, M
    [J]. 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 745 - 748
  • [5] Free Acoustic and Language Models for Large Vocabulary Continuous Speech Recognition in Swedish
    Vanhainen, Niklas
    Salvi, Giampiero
    [J]. LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014,
  • [6] Spoken language identification using large vocabulary speech recognition.
    Hieronymus, JL
    Kadambe, S
    [J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1780 - 1783
  • [7] Automatic language identification using large vocabulary continuous speech recognition
    Mendoza, S
    Gillick, L
    Ito, Y
    Lowe, S
    Newmann, M
    [J]. 1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 785 - 788
  • [8] SloParl - Slovenian Parliamentary speech and text corpus for large vocabulary continuous speech recognition
    Zgank, Andrej
    Rotovnik, Tomaz
    Grasic, Matej
    Kos, Marko
    Vlaj, Damjan
    Kacic, Zdravko
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 197 - 200
  • [9] Robust spoken Language Identification using Large Vocabulary Speech Recognition.
    Hieronymus, JL
    Kadambe, S
    [J]. 1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 1111 - 1114
  • [10] Large vocabulary continuous speech recognition of an inflected language using stems and endings
    Rotovnik, Tomaz
    Maucec, Mirjam Sepesy
    Kacic, Zdravko
    [J]. SPEECH COMMUNICATION, 2007, 49 (06) : 437 - 452