Online Temporal Language Model Adaptation for a Thai Broadcast News Transcription System

被引:0
|
作者
Saykham, Kwanchiva [1 ,2 ]
Chotimongkol, Ananlada [1 ]
Wutiwiwatchai, Chai [1 ]
机构
[1] Natl Elect & Comp Technol Ctr NECTEC, Klonng Luang 12120, Pathumthani, Thailand
[2] Thammasat Univ, Sirindhorn Int Inst Technol, Sch Informat & Comp Technol, Pathum Thani 12000, Thailand
关键词
D O I
暂无
中图分类号
H [语言、文字];
学科分类号
05 ;
摘要
This paper investigates the effectiveness of online temporal language model adaptation when applied to a Thai broadcast news transcription task. Our adaptation scheme works as follow: first an initial language model is trained with broadcast news transcription available during the development period. Then the language model is adapted over time with more recent broadcast news transcription and online news articles available during deployment especially the data from the same time period as the broadcast news speech being recognized. We found that the data that are closer in time are more similar in terms of perplexity and are more suitable for language model adaptation. The LMs that are adapted over time with more recent news data are better, both in terms of perplexity and WER, than the static LM trained from only the initial set of broadcast news data. Adaptation data from broadcast news transcription improved perplexity by 38.3% and WER by 7.1% relatively. Though, online news articles achieved less improvement, it is still a useful resource as it can be obtained automatically. Better data pre-processing techniques and data selection techniques based on text similarity could be applied to the news articles to obtain further improvement from this promising result.
引用
收藏
页码:1690 / 1694
页数:5
相关论文
共 50 条
  • [1] Statistical language model adaptation for Mandarin broadcast news transcription
    Chen, B
    Tsai, WH
    Kuo, JW
    [J]. 2004 International Symposium on Chinese Spoken Language Processing, Proceedings, 2004, : 313 - 316
  • [2] Unsupervised language model adaptation for broadcast news
    Chen, LZ
    Gauvain, JL
    Lamel, L
    Adda, G
    [J]. 2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 220 - 223
  • [3] First Broadcast News Transcription System for Khmer Language
    Seng, Sopheap
    Sam, Sethserey
    Besacier, Laurent
    Bigi, Brigitte
    Castelli, Eric
    [J]. SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 2658 - 2661
  • [4] Domain Adaptation of a Broadcast News Transcription System for the Portuguese Parliament
    Neves, Luis
    Martins, Ciro
    Meinedo, Hugo
    Neto, Joao
    [J]. COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROCEEDINGS, 2008, 5190 : 163 - 171
  • [5] Transcription of broadcast news - System robustness issues and adaptation techniques
    Bakis, R
    Chen, S
    Gopalakrishnan, P
    Gopinath, R
    Maes, S
    Polymenakos, L
    [J]. 1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 711 - 714
  • [6] Unsupervised Language Model Adaptation for Mandarin Broadcast Conversation Transcription
    Mrva, David
    Woodland, Philip C.
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2210 - 2213
  • [7] Discriminative language model adaptation for Mandarin broadcast speech transcription and translation
    Liu, X. A.
    Byrne, W. J.
    Gales, M. J. F.
    de Gispert, A.
    Tomalin, M.
    Woodland, P. C.
    Yu, K.
    [J]. 2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, : 153 - 158
  • [8] The LIMSI Broadcast News transcription system
    Gauvain, JL
    Lamel, L
    Adda, G
    [J]. SPEECH COMMUNICATION, 2002, 37 (1-2) : 89 - 108
  • [9] Arabic broadcast news transcription system
    Alghamdi, Mansour
    Elshafei, Moustafa
    Al-Muhtaseb, Husni
    [J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2007, 10 (04) : 183 - 195
  • [10] Incremental language modeling for automatic transcription of broadcast news
    Ohtsuki, Katsutoshi
    Nguyen, Long
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2007, E90D (02): : 526 - 532