Using morphemes in language modeling and automatic speech recognition of Amharic

被引：1

作者：

Tachbelie, Martha Yifiru ^{[1
]}

Abate, Solomon Teferra ^{[1
]}

Menzel, Wolfgang ^{[2
]}

机构：

[1] Univ Addis Ababa, Sch Informat Sci, Addis Ababa, Ethiopia

[2] Univ Hamburg, Dept Informat, Hamburg, Germany

来源：

NATURAL LANGUAGE ENGINEERING | 2014年 / 20卷 / 02期

关键词：

D O I：

10.1017/S1351324912000356

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper presents morpheme-based language models developed for Amharic (a morphologically rich Semitic language) and their application to a speech recognition task. A substantial reduction in the out of vocabulary rate has been observed as a result of using subwords or morphemes. Thus a severe problem of morphologically rich languages has been addressed. Moreover, lower perplexity values have been obtained with morpheme-based language models than with word-based models. However, when comparing the quality based on the probability assigned to the test sets, word-based models seem to fare better. We have studied the utility of morpheme-based language models in speech recognition systems and found that the performance of a relatively small vocabulary (5k) speech recognition system improved significantly as a result of using morphemes as language modeling and dictionary units. However, as the size of the vocabulary increases (20k or more) the morpheme-based systems suffer from acoustic confusability and did not achieve a significant improvement over a word-based system with an equivalent vocabulary size even with the use of higher order (quadrogram) n-gram language models.

引用

页码：235 / 259

页数：25

共 50 条

[1] Effect of Language Resources on Automatic Speech Recognition for Amharic
Tachbelie, Martha Yifiru
Abate, Solomon Teferra
[J]. PROCEEDINGS OF THE 2015 12TH IEEE AFRICON INTERNATIONAL CONFERENCE - GREEN INNOVATION FOR AFRICAN RENAISSANCE (AFRICON), 2015,
[2] Automatic Speech Recognition for an Under-Resourced Language - Amharic
Abate, Solomon Teferra
Menzel, Wolfgang
[J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1737 - 1740
[3] Automatic Speech Recognition for an Under-Resourced Language - Amharic
Abate, Solomon Teferra
Menzel, Wolfgang
[J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 973 - 976
[4] Lexical modeling for the development of Amharic automatic speech recognition systems
Tachbelie, Martha Yifiru
Abate, Solomon Teferra
[J]. LANGUAGE RESOURCES AND EVALUATION, 2023, 57 (03) : 963 - 984
[5] Lexical modeling for the development of Amharic automatic speech recognition systems
Martha Yifiru Tachbelie
Solomon Teferra Abate
[J]. Language Resources and Evaluation, 2023, 57 : 963 - 984
[6] Automatic speech recognition using probabilistic transcriptions in Swahili, Amharic, and Dinka
Das, Amit
Jyothi, Preethi
Hasegawa-Johnson, Mark
[J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3524 - 3528
[7] Concatenative Speech Recognition using Morphemes
Jafri, Afshan
[J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (03) : 671 - 680
[8] A Decade of Discriminative Language Modeling for Automatic Speech Recognition
Saraclar, Murat
Dikici, Erinc
Arisoy, Ebru
[J]. SPEECH AND COMPUTER (SPECOM 2015), 2015, 9319 : 11 - 22
[9] An Evaluation of Structured Language Modeling for Automatic Speech Recognition
Bjorklund, Johanna
Cleophas, Loek
Karlsson, My
[J]. JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2017, 23 (11) : 1019 - 1034
[10] Improving Language Modeling with an Adversarial Critic for Automatic Speech Recognition
Zhang, Yike
Zhang, Pengyuan
Yan, Yonghong
[J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3348 - 3352

← 1 2 3 4 5 →