Language modeling and transcription of the TED corpus lectures

被引:0
|
作者
Leeuwis, E [1 ]
Federico, M [1 ]
Cettolo, M [1 ]
机构
[1] Univ Twente, Dept Comp Sci, NL-7500 AE Enschede, Netherlands
关键词
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Transcribing lectures is a challenging task, both in acoustic and in language modeling. In this work, we present our first results on the automatic transcription of lectures from the TED corpus, recently released by ELRA and LDC. In particular, we concentrated our effort on language modeling. Baseline acoustic and language models were developed using respectively 8 hours of TED transcripts and various types of texts: conference proceedings, lecture transcripts, and conversational speech transcripts. Then, adaptation of the language model to single speakers was investigated by exploiting different kinds of information: automatic transcripts of the talk, the title of the talk, the abstract and, finally, the paper. In the last case, a 39.2% WER was achieved.
引用
收藏
页码:232 / 235
页数:4
相关论文
共 50 条
  • [1] Enhancing the TED-LIUM Corpus with Selected Data for Language Modeling and More TED Talks
    Rousseau, Anthony
    Deleglise, Paul
    Esteve, Yannick
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 3935 - 3939
  • [2] Unsupervised stemmed text corpus for language modeling and transcription of Telugu broadcast news
    Pala, Mythilisharan
    Parayitam, Laxminarayana
    Appala, Venkataramana
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2020, 23 (03) : 695 - 704
  • [3] Unsupervised stemmed text corpus for language modeling and transcription of Telugu broadcast news
    Mythilisharan Pala
    Laxminarayana Parayitam
    Venkataramana Appala
    International Journal of Speech Technology, 2020, 23 : 695 - 704
  • [4] LANGUAGE MODEL ADAPTATION FOR VIDEO LECTURES TRANSCRIPTION
    Martinez-Villaronga, Adria
    del Agua, Miguel A.
    Andres-Ferrer, Jesus
    Juan, Alfons
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 8450 - 8454
  • [5] A Corpus of Spontaneous Speech in Lectures : The KIT Lecture Corpus for Spoken Language Processing and Translation
    Cho, Eunah
    Fuenfer, Sarah
    Stueker, Sebastian
    Waibel, Alex
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 1554 - 1559
  • [6] Language modeling based on corpus
    Xu, Wei
    Yuan, Chunfa
    Huang, Changning
    Qinghua Daxue Xuebao/Journal of Tsinghua University, 1997, 37 (03): : 71 - 75
  • [8] A Speech Corpus for Modeling Language Acquisition: CAREGIVER
    Altosaar, T.
    ten Bosch, L.
    Aimetti, G.
    Koniaris, C.
    Demuynck, K.
    van den Heuvel, H.
    LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010,
  • [9] Academic lexical coverage in TED talks and academic lectures
    Wingrove, Peter
    ENGLISH FOR SPECIFIC PURPOSES, 2022, 65 : 79 - 94
  • [10] Text Gathering and Processing Agent for Language Modeling Corpus
    Hladek, Daniel
    Stas, Jan
    12TH INTERNATIONAL CONFERENCE ON RESEARCH IN TELECOMMUNICATION TECHNOLOGIES (RTT 2010), 2010, : 137 - 140