Language model and speaking rate adaptation for spontaneous presentation speech recognition

被引:28
|
作者
Nanjo, H [1 ]
Kawahara, T [1 ]
机构
[1] Kyoto Univ, Grad Sch Informat, Kyoto 6068501, Japan
来源
关键词
acoustic modeling; language model adaptation; pronunciation modeling; speaking rate; spontaneous speech recognition;
D O I
10.1109/TSA.2004.828641
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The paper addresses adaptation methods to language model and speaking rate (SR) of individual speakers which are two major problems in automatic transcription of spontaneous presentation speech. To cope with a large variation in expression and pronunciation of words depending on the speaker, firstly, we investigate the effect of statistical and context-dependent pronunciation modeling. Secondly, we present unsupervised methods of language model adaptation to a specific speaker and a topic by 1) selecting similar texts based on the word perplexity and TF-IDF measure and 2) making direct use of the initial recognition result for generating an enhanced model. We confirm that all proposed adaptation methods and their combinations reduce the perplexity and word error rate. We also present a decoding strategy adapted to the SR. In spontaneous speech, SR is generally fast and may vary a lot. We also observe different error tendencies for portions of presentations where speech is fast or slow. Therefore, we propose a SR-dependent decoding strategy that applies the most appropriate acoustic analysis, phone models, and decoding parameters according to the SR. Several methods are investigated and their selective application leads to improved accuracy. The combined effect of the two proposed adaptation methods is also confirmed in transcription of real academic presentation.
引用
收藏
页码:391 / 400
页数:10
相关论文
共 50 条
  • [1] Speaking-rate dependent decoding and adaptation for spontaneous lecture speech recognition
    Nanjo, H
    Kawahara, T
    [J]. 2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 725 - 728
  • [2] Dynamic Language Model Adaptation Using Presentation Slides for Lecture Speech Recognition
    Yamazaki, Hiroki
    Iwano, Koji
    Shinoda, Koichi
    Furui, Sadaoki
    Yokota, Haruo
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 89 - 92
  • [3] Unsupervised class-based language model adaptation for spontaneous speech recognition
    Yokoyama, T
    Shinozaki, T
    Iwano, K
    Furui, S
    [J]. 2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 236 - 239
  • [4] Introduction of the speaking rate in the model of speech recognition
    Yousfi, A
    Meziane, A
    [J]. INTERNATIONAL CONFERENCE ON PARALLEL COMPUTING IN ELECTRICAL ENGINEERING - PARELEC 2000, PROCEEDINGS, 2000, : 64 - 66
  • [5] Topic-independent speaking-style transformation of language model for spontaneous speech recognition
    Akita, Yuya
    Kawahara, Tatsuya
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 33 - +
  • [6] Acoustic and Language Models Adaptation for Indonesian Spontaneous Speech Recognition
    Lestari, Dessi Puji
    Irfani, Angela
    [J]. 2015 2ND INTERNATIONAL CONFERENCE ON ADVANCED INFORMATICS: CONCEPTS, THEORY AND APPLICATIONS ICAICTA, 2015,
  • [7] Combinations of Various Language Model Technologies including Data Expansion and Adaptation in Spontaneous Speech Recognition
    Masumura, Ryo
    Asami, Taichi
    Oba, Takanobu
    Masataki, Hirokazu
    Sakauchi, Sumitaka
    Ito, Akinori
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 463 - 467
  • [8] Boosting of speech recognition performance by language model adaptation
    Korkmazsky, Filipp
    Jojic, Oliver
    Shevade, Bageshree
    [J]. 2007 IEEE AEROSPACE CONFERENCE, VOLS 1-9, 2007, : 1592 - 1601
  • [9] STATISTICAL LANGUAGE MODEL ADAPTATION FOR ESTONIAN SPEECH RECOGNITION
    Alumaee, Tanel
    [J]. EESTI RAKENDUSLINGVISTIKA UHINGU AASTARAAMAT, 2008, 4 : 5 - 16
  • [10] Exploring the Role of Speaking-Rate Adaptation on Children's Speech Recognition
    Shahnawazuddin, S.
    Kathania, Hemant K.
    Singh, Chaman
    Ahmad, Waquar
    Pradhan, Gayadhar
    [J]. 2018 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS (SPCOM 2018), 2018, : 21 - 25