Speech Resources in the Tamasheq Language

被引:0
|
作者
Boito, Marcely Zanon [1 ]
Bougares, Fethi [2 ]
Barbier, Florentin [3 ]
Gahbiche, Souhir [3 ]
Barrault, Loic [2 ]
Rouvier, Mickael [1 ]
Esteve, Yannick [1 ]
机构
[1] Avignon Univ, LIA, Avignon, France
[2] Le Mans Univ, LIUM, Le Mans, France
[3] Airbus, Toulouse, France
基金
欧盟地平线“2020”;
关键词
speech corpus; speech translation; tamasheq; zarma; hausa; fulfulde; french;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In this paper we present two datasets for Tamasheq, a developing language mainly spoken in Mali and Niger. These two datasets were made available for the IWSLT 2022 low-resource speech translation track, and they consist of collections of radio recordings from daily broadcast news in Niger (Studio Kalangou) and Mali (Studio Tamani). We share (i) a massive amount of unlabeled audio data (671 hours) in five languages: French from Niger, Fulfulde, Hausa, Tamasheq and Zarma, and (ii) a smaller 17 hours parallel corpus of audio recordings in Tamasheq, with utterance-level translations in the French language. All this data is shared under the Creative Commons BY-NC-ND 3.0 license. We hope these resources will inspire the speech community to develop and benchmark models using the Tamasheq language.
引用
收藏
页码:2066 / 2071
页数:6
相关论文
共 50 条
  • [1] On the Development of Speech Resources for the Mixtec Language
    Caballero-Morales, Santiago-Omar
    [J]. SCIENTIFIC WORLD JOURNAL, 2013,
  • [2] Speech and Language Resources for LVCSR of Russian
    Zablotskiy, Sergey
    Shvets, Alexander
    Sidorov, Maxim
    Semenkin, Eugene
    Minker, Wolfgang
    [J]. LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 3374 - 3377
  • [3] Spoken language resources for Cantonese speech processing
    Lee, T
    Lo, WK
    Ching, PC
    Meng, H
    [J]. SPEECH COMMUNICATION, 2002, 36 (3-4) : 327 - 342
  • [4] Open Source Speech and Language Resources for Frisian
    Yilmaz, Emre
    van den Heuvel, Henk
    Dijkstra, Jelske
    Van de Velde, Hans
    Kampstra, Frederik
    Algra, Jouke
    Van Leeuwen, David
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1536 - 1540
  • [5] Effect of Language Resources on Automatic Speech Recognition for Amharic
    Tachbelie, Martha Yifiru
    Abate, Solomon Teferra
    [J]. PROCEEDINGS OF THE 2015 12TH IEEE AFRICON INTERNATIONAL CONFERENCE - GREEN INNOVATION FOR AFRICAN RENAISSANCE (AFRICON), 2015,
  • [6] The DELAD initiative for sharing language resources on speech disorders
    Lee, Alice
    Bessell, Nicola
    van den Heuvel, Henk
    Klessa, Katarzyna
    Saalasti, Satu
    [J]. LANGUAGE RESOURCES AND EVALUATION, 2024, 58 (03) : 865 - 879
  • [7] Development of Language Resources for Speech Application in Gujarati and Marathi
    Madhavi, Maulik C.
    Sharma, Shubham
    Patil, Hemant A.
    [J]. PROCEEDINGS OF THE 2014 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2014), 2014, : 115 - 118
  • [8] Language resources and CALL applications: speech data and speech technology in the DISCO project
    Strik, Helmer
    Colpaert, Jozef
    van Doremalen, Joost
    Cucchiarini, Catia
    [J]. LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010, : B1 - B6
  • [9] Experiences with Shared Resources for Research and Education in Speech and Language Processing
    Bates, Rebecca
    Fosler-Lussier, Eric
    Metze, Florian
    Larson', Martha
    Levow, Gina-Anne
    Provost, Emily Mower
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1627 - 1631
  • [10] Romanian language statistics and resources for text-to-speech systems
    Stan, Adriana
    Giurgiu, Mircea
    [J]. 2010 9TH INTERNATIONAL SYMPOSIUM ON ELECTRONICS AND TELECOMMUNICATIONS (ISETC), 2010, : 381 - 384