Language model and speaking rate adaptation for spontaneous presentation speech recognition

被引：28

作者：

Nanjo, H ^{[1
]}

Kawahara, T ^{[1
]}

机构：

[1] Kyoto Univ, Grad Sch Informat, Kyoto 6068501, Japan

来源：

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING | 2004年 / 12卷 / 04期

关键词：

acoustic modeling; language model adaptation; pronunciation modeling; speaking rate; spontaneous speech recognition;

D O I：

10.1109/TSA.2004.828641

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

The paper addresses adaptation methods to language model and speaking rate (SR) of individual speakers which are two major problems in automatic transcription of spontaneous presentation speech. To cope with a large variation in expression and pronunciation of words depending on the speaker, firstly, we investigate the effect of statistical and context-dependent pronunciation modeling. Secondly, we present unsupervised methods of language model adaptation to a specific speaker and a topic by 1) selecting similar texts based on the word perplexity and TF-IDF measure and 2) making direct use of the initial recognition result for generating an enhanced model. We confirm that all proposed adaptation methods and their combinations reduce the perplexity and word error rate. We also present a decoding strategy adapted to the SR. In spontaneous speech, SR is generally fast and may vary a lot. We also observe different error tendencies for portions of presentations where speech is fast or slow. Therefore, we propose a SR-dependent decoding strategy that applies the most appropriate acoustic analysis, phone models, and decoding parameters according to the SR. Several methods are investigated and their selective application leads to improved accuracy. The combined effect of the two proposed adaptation methods is also confirmed in transcription of real academic presentation.

引用

页码：391 / 400

页数：10

共 50 条

[1] Speaking-rate dependent decoding and adaptation for spontaneous lecture speech recognition
Nanjo, H
Kawahara, T
[J]. 2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 725 - 728
[2] Dynamic Language Model Adaptation Using Presentation Slides for Lecture Speech Recognition
Yamazaki, Hiroki
Iwano, Koji
Shinoda, Koichi
Furui, Sadaoki
Yokota, Haruo
[J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 89 - 92
[3] Unsupervised class-based language model adaptation for spontaneous speech recognition
Yokoyama, T
Shinozaki, T
Iwano, K
Furui, S
[J]. 2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 236 - 239
[4] Introduction of the speaking rate in the model of speech recognition
Yousfi, A
Meziane, A
[J]. INTERNATIONAL CONFERENCE ON PARALLEL COMPUTING IN ELECTRICAL ENGINEERING - PARELEC 2000, PROCEEDINGS, 2000, : 64 - 66
[5] Topic-independent speaking-style transformation of language model for spontaneous speech recognition
Akita, Yuya
Kawahara, Tatsuya
[J]. 2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 33 - +
[6] Acoustic and Language Models Adaptation for Indonesian Spontaneous Speech Recognition
Lestari, Dessi Puji
Irfani, Angela
[J]. 2015 2ND INTERNATIONAL CONFERENCE ON ADVANCED INFORMATICS: CONCEPTS, THEORY AND APPLICATIONS ICAICTA, 2015,
[7] Combinations of Various Language Model Technologies including Data Expansion and Adaptation in Spontaneous Speech Recognition
Masumura, Ryo
Asami, Taichi
Oba, Takanobu
Masataki, Hirokazu
Sakauchi, Sumitaka
Ito, Akinori
[J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 463 - 467
[8] Boosting of speech recognition performance by language model adaptation
Korkmazsky, Filipp
Jojic, Oliver
Shevade, Bageshree
[J]. 2007 IEEE AEROSPACE CONFERENCE, VOLS 1-9, 2007, : 1592 - 1601
[9] STATISTICAL LANGUAGE MODEL ADAPTATION FOR ESTONIAN SPEECH RECOGNITION
Alumaee, Tanel
[J]. EESTI RAKENDUSLINGVISTIKA UHINGU AASTARAAMAT, 2008, 4 : 5 - 16
[10] Exploring the Role of Speaking-Rate Adaptation on Children's Speech Recognition
Shahnawazuddin, S.
Kathania, Hemant K.
Singh, Chaman
Ahmad, Waquar
Pradhan, Gayadhar
[J]. 2018 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS (SPCOM 2018), 2018, : 21 - 25

← 1 2 3 4 5 →