A language model adaptation using multiple varied corpora

被引：0

作者：

Yamamoto, H ^{[1
]}

Sagisaka, Y ^{[1
]}

机构：

[1] ATR, Spoken Language Translat Res Labs, Kyoto 6190288, Japan

来源：

ASRU 2001: IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, CONFERENCE PROCEEDINGS | 2001年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

A new language model adaptation scheme is proposed to cope with multiple varied speech recognition tasks. Both topic difference and sentence style difference resulting from the speaker's role are reflected in the proposed language model adaptation. An adaptation is carried out using two different language corpora where only the topic or speaker's style is matched. New word clustering techniques are introduced to extract the topic or style dependency separately. Word neighboring characteristics in the two adaptation source data regarded as different features in this clustering. All words are classified into commonly use word classes and topic or style dependent classes. Furthermore, target topic and sentence style dependent words and their neighboring characteristics are emphasized according to their frequency in the adaptation target data. In the evaluation experiment, the proposed method shows a 13% lower perplexity and a 9% lower word error rate in continuous speech recognition compared with the conventional adaptation method.

引用

页码：389 / 392

页数：4

共 50 条

[1] Language Model Adaptation for Tiny Adaptation Corpora
Klakow, Dietrich
[J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2214 - 2217
[2] A Large Language Model Approach to Detect Hate Speech in Political Discourse Using Multiple Language Corpora
de Oliveira, Aillkeen Bezerra
Baptista, Claudio de Souza
Firmino, Anderson Almeida
de Paiva, Anselmo Cardoso
[J]. 39TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2024, 2024, : 1461 - 1468
[3] Language model adaptation in Tamil language using cross-lingual latent semantic analysis with document aligned corpora
Selvam, M.
Natarajan, A. M.
[J]. CURRENT SCIENCE, 2010, 98 (07): : 922 - 929
[4] Minimum Discrimination Information-based Language Model Adaptation Using Tiny Domain Corpora for Intelligent Personal Assistants
Jang, Gil-Jin
Kim, Saejoon
Kim, Ji-Hwan
[J]. IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2012, 58 (04) : 1359 - 1365
[5] Using Corpora in the Language Classroom
Scheffler, Pawe
[J]. ELT JOURNAL, 2011, 65 (03) : 348 - U2
[6] Using Corpora in the Language Classroom
Mull, Jacqueline
[J]. LANGUAGE LEARNING & TECHNOLOGY, 2012, 16 (03): : 49 - 52
[7] Using corpora for language research
Lewis, D
[J]. MODERN LANGUAGE REVIEW, 1998, 93 : 763 - 764
[8] LANGUAGE MODEL ADAPTATION USING RANDOM FORESTS
Deoras, Anoop
Jelinek, Frederick
Su, Yi
[J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 5198 - 5201
[9] USING CORPORA AS A RESOURCE IN LANGUAGE TEACHING
WILSON, E
[J]. COMPUTERS & EDUCATION, 1994, 23 (1-2) : 41 - 51
[10] Using corpora for language research.
MacWhinney, B
[J]. APPLIED PSYCHOLINGUISTICS, 1998, 19 (04) : 691 - 692

← 1 2 3 4 5 →