Open Source Speech and Language Resources for Frisian

被引:7
|
作者
Yilmaz, Emre [1 ]
van den Heuvel, Henk [1 ]
Dijkstra, Jelske [2 ]
Van de Velde, Hans [2 ]
Kampstra, Frederik [3 ]
Algra, Jouke [3 ]
Van Leeuwen, David [1 ]
机构
[1] Radboud Univ Nijmegen, CLS CLST, Nijmegen, Netherlands
[2] Fryske Akad, Leeuwarden, Netherlands
[3] Omrop Fryslan, Leeuwarden, Netherlands
关键词
Open source; Frisian language; speech data; automatic speech recognition; RECOGNITION; CORPUS;
D O I
10.21437/Interspeech.2016-48
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we present several open source speech and language resources for the under-resourced Frisian language. Frisian is mostly spoken in the province of Fryslan which is located in the north of the Netherlands. The native speakers of Frisian are Frisian-Dutch bilingual and often code-switch in daily conversations. The resources presented in this paper include a code-switching speech database containing radio broadcasts, a phonetic lexicon with more than 70k words and a language model trained on a text corpus with more than 38M words. With this contribution, we aim to share the Frisian resources we have collected in the scope of the FAME! project, in which a spoken document retrieval system is built for the disclosure of the regional broadcaster's radio archives. These resources enable research on code-switching and longitudinal speech and language change. Moreover, a sample automatic speech recognition (ASR) recipe for the Kaldi toolkit will also be provided online to facilitate the Frisian ASR research.
引用
收藏
页码:1536 / 1540
页数:5
相关论文
共 50 条
  • [1] An open source part-of-speech tagger for Norwegian: Building on existing language resources
    Marco, Cristina S.
    [J]. LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 4111 - 4117
  • [2] Gender Representation in Open Source Speech Resources
    Garnerin, Mahault
    Rossato, Solange
    Besacier, Laurent
    [J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 6599 - 6605
  • [3] Speech Resources in the Tamasheq Language
    Boito, Marcely Zanon
    Bougares, Fethi
    Barbier, Florentin
    Gahbiche, Souhir
    Barrault, Loic
    Rouvier, Mickael
    Esteve, Yannick
    [J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 2066 - 2071
  • [4] A DICTIONARY OF THE FRISIAN LANGUAGE - FRISIAN - VANVEEN,KF
    HEESTERMANS, H
    [J]. TIJDSCHRIFT VOOR NEDERLANDSE TAAL-EN LETTERKUNDE, 1988, 104 (01): : 79 - 80
  • [5] Open source About the resources
    不详
    [J]. ASTRONOMY & GEOPHYSICS, 2020, 61 (06) : 23 - 23
  • [6] On the Development of Speech Resources for the Mixtec Language
    Caballero-Morales, Santiago-Omar
    [J]. SCIENTIFIC WORLD JOURNAL, 2013,
  • [7] Speech and Language Resources for LVCSR of Russian
    Zablotskiy, Sergey
    Shvets, Alexander
    Sidorov, Maxim
    Semenkin, Eugene
    Minker, Wolfgang
    [J]. LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 3374 - 3377
  • [8] Open-Source Boundary-Annotated Corpus for Arabic Speech and Language Processing
    Brierley, Claire
    Sawalha, Majdi
    Atwell, Eric
    [J]. LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 1011 - 1016
  • [9] Frisian as an Endangered Language: An Overview
    Buczek, Katarzyna
    [J]. ACADEMIC JOURNAL OF MODERN PHILOLOGY, 2019, 8 : 41 - 50
  • [10] Spoken language resources for Cantonese speech processing
    Lee, T
    Lo, WK
    Ching, PC
    Meng, H
    [J]. SPEECH COMMUNICATION, 2002, 36 (3-4) : 327 - 342