Language modeling and transcription of the TED corpus lectures

被引：0

作者：

Leeuwis, E ^{[1
]}

Federico, M ^{[1
]}

Cettolo, M ^{[1
]}

机构：

[1] Univ Twente, Dept Comp Sci, NL-7500 AE Enschede, Netherlands

来源：

2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I | 2003年

关键词：

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Transcribing lectures is a challenging task, both in acoustic and in language modeling. In this work, we present our first results on the automatic transcription of lectures from the TED corpus, recently released by ELRA and LDC. In particular, we concentrated our effort on language modeling. Baseline acoustic and language models were developed using respectively 8 hours of TED transcripts and various types of texts: conference proceedings, lecture transcripts, and conversational speech transcripts. Then, adaptation of the language model to single speakers was investigated by exploiting different kinds of information: automatic transcripts of the talk, the title of the talk, the abstract and, finally, the paper. In the last case, a 39.2% WER was achieved.

引用

页码：232 / 235

页数：4

共 50 条

[1] Enhancing the TED-LIUM Corpus with Selected Data for Language Modeling and More TED Talks
Rousseau, Anthony
Deleglise, Paul
Esteve, Yannick
LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 3935 - 3939
[2] Unsupervised stemmed text corpus for language modeling and transcription of Telugu broadcast news
Pala, Mythilisharan
Parayitam, Laxminarayana
Appala, Venkataramana
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2020, 23 (03) : 695 - 704
[3] Unsupervised stemmed text corpus for language modeling and transcription of Telugu broadcast news
Mythilisharan Pala
Laxminarayana Parayitam
Venkataramana Appala
International Journal of Speech Technology, 2020, 23 : 695 - 704
[4] LANGUAGE MODEL ADAPTATION FOR VIDEO LECTURES TRANSCRIPTION
Martinez-Villaronga, Adria
del Agua, Miguel A.
Andres-Ferrer, Jesus
Juan, Alfons
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 8450 - 8454
[5] A Corpus of Spontaneous Speech in Lectures : The KIT Lecture Corpus for Spoken Language Processing and Translation
Cho, Eunah
Fuenfer, Sarah
Stueker, Sebastian
Waibel, Alex
LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 1554 - 1559
[6] Language modeling based on corpus
Xu, Wei
Yuan, Chunfa
Huang, Changning
Qinghua Daxue Xuebao/Journal of Tsinghua University, 1997, 37 (03): : 71 - 75
[7] THE LANGUAGE OF BUSINESS STUDIES LECTURES: A CORPUS-ASSISTED ANALYSIS
Jacobs, Geert
APPLIED LINGUISTICS, 2009, 30 (03) : 453 - 456
[8] A Speech Corpus for Modeling Language Acquisition: CAREGIVER
Altosaar, T.
ten Bosch, L.
Aimetti, G.
Koniaris, C.
Demuynck, K.
van den Heuvel, H.
LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010,
[9] Academic lexical coverage in TED talks and academic lectures
Wingrove, Peter
ENGLISH FOR SPECIFIC PURPOSES, 2022, 65 : 79 - 94
[10] Text Gathering and Processing Agent for Language Modeling Corpus
Hladek, Daniel
Stas, Jan
12TH INTERNATIONAL CONFERENCE ON RESEARCH IN TELECOMMUNICATION TECHNOLOGIES (RTT 2010), 2010, : 137 - 140

← 1 2 3 4 5 →