Language modeling and transcription of the TED corpus lectures

被引:0
|
作者
Leeuwis, E [1 ]
Federico, M [1 ]
Cettolo, M [1 ]
机构
[1] Univ Twente, Dept Comp Sci, NL-7500 AE Enschede, Netherlands
关键词
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Transcribing lectures is a challenging task, both in acoustic and in language modeling. In this work, we present our first results on the automatic transcription of lectures from the TED corpus, recently released by ELRA and LDC. In particular, we concentrated our effort on language modeling. Baseline acoustic and language models were developed using respectively 8 hours of TED transcripts and various types of texts: conference proceedings, lecture transcripts, and conversational speech transcripts. Then, adaptation of the language model to single speakers was investigated by exploiting different kinds of information: automatic transcripts of the talk, the title of the talk, the abstract and, finally, the paper. In the last case, a 39.2% WER was achieved.
引用
收藏
页码:232 / 235
页数:4
相关论文
共 50 条