Large vocabulary speech recognition of Slovenian language using morphological models

被引：0

作者：

Maucec, M ^{[1
]}

Rotovnik, T ^{[1
]}

Kacic, Z ^{[1
]}

Horvat, B ^{[1
]}

机构：

[1] Univ Maribor, Inst Elect, Fac Elect Engn & Comp Sci, SI-2000 Maribor, Slovenia

来源：

IEEE REGION 8 EUROCON 2003, VOL B, PROCEEDINGS: COMPUTER AS A TOOL | 2003年

关键词：

language modelling; automatic continuous speech recognition; morphology; large vocabulary; data-driven methods; topic adaptation;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper concerns the development of automatic speech recognition system for Slovenian language. The large number of unique words in inflected languages is identified as the primary reason for performance degradation. This article discusses the statistical language models. A novel variation of the n-gram modelling theme is examined. Modelling units are chosen to be stems and endings instead of words. Only data-driven algorithms are employed to decompose words into stems and endings automatically. Significant reduction of OOV rate results when using stems and endings for modelling the Slovenian language. We as well discuss corpus-based topic-adapted language models. Language models are most often used in topic homogeneous environment. The problem of topic detection in highly inflected language is outlined, caused by appearance of several word forms derived from the same lemma. The problem is solved by using data-driven algorithms to group words of the same lemma into classes.

引用

页码：158 / 161

页数：4

共 50 条

[1] Using Morphological Data in Language Modeling for Serbian Large Vocabulary Speech Recognition
Pakoci, Edvin
Popovic, Branislav
Pekar, Darko
[J]. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2019, 2019
[2] Large vocabulary speech recognition with multispan statistical language models
Bellegarda, JR
[J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2000, 8 (01): : 76 - 84
[3] Slovenian large vocabulary speech recognition with data-driven models of inflectional morphology
Rotovnik, T
Maucec, MS
Horvat, B
Kacic, Z
[J]. ASRU'03: 2003 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING ASRU '03, 2003, : 83 - 88
[4] Hybrid language models for out of vocabulary word detection in large vocabulary conversational speech recognition
Yazgan, A
Saraclar, M
[J]. 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 745 - 748
[5] Free Acoustic and Language Models for Large Vocabulary Continuous Speech Recognition in Swedish
Vanhainen, Niklas
Salvi, Giampiero
[J]. LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014,
[6] Spoken language identification using large vocabulary speech recognition.
Hieronymus, JL
Kadambe, S
[J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1780 - 1783
[7] Automatic language identification using large vocabulary continuous speech recognition
Mendoza, S
Gillick, L
Ito, Y
Lowe, S
Newmann, M
[J]. 1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 785 - 788
[8] SloParl - Slovenian Parliamentary speech and text corpus for large vocabulary continuous speech recognition
Zgank, Andrej
Rotovnik, Tomaz
Grasic, Matej
Kos, Marko
Vlaj, Damjan
Kacic, Zdravko
[J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 197 - 200
[9] Robust spoken Language Identification using Large Vocabulary Speech Recognition.
Hieronymus, JL
Kadambe, S
[J]. 1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 1111 - 1114
[10] Large vocabulary continuous speech recognition of an inflected language using stems and endings
Rotovnik, Tomaz
Maucec, Mirjam Sepesy
Kacic, Zdravko
[J]. SPEECH COMMUNICATION, 2007, 49 (06) : 437 - 452

← 1 2 3 4 5 →