Speech Generation for Indigenous Language Education

被引:1
|
作者
Pine, Aidan [1 ]
Cooper, Erica [2 ]
Guzman, David [1 ]
Joanis, Eric [1 ]
Kazantseva, Anna [1 ]
Krekoski, Ross [4 ]
Kuhn, Roland [1 ]
Larkin, Samuel [1 ]
Littell, Patrick [1 ]
Lothian, Delaney [1 ]
Martin, Akwiratekha' [1 ]
Richmond, Korin [3 ]
Tessier, Marc [1 ]
Valentini-Botinhao, Cassia [3 ]
Wells, Dan [3 ]
Yamagishi, Junichi [2 ]
机构
[1] Natl Res Council Canada, 1200 Montreal Rd, Ottawa, ON K1A 0R6, Canada
[2] Natl Inst Informat, 2 Chome-1-2 Hitotsubashi, Tokyo, 1018430, Japan
[3] Univ Edinburgh, 10 Crichton St, Edinburgh EH8 9AB, Scotland
[4] Univ Nuhelotine Thaiyotsi Nistameyimakanak Blue Qu, 3 Airport Rd N, St Paul, AB T0A 3A0, Canada
来源
关键词
Speech synthesis; Text-to-speech; Low-resource languages; Indigenous languages; Language education; Language revitalization; TEXT-TO-SPEECH; PHONOLOGICAL FEATURES; SPEAK;
D O I
10.1016/j.csl.2024.101723
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As the quality of contemporary speech synthesis improves, so too does the interest from language communities in developing text-to-speech (TTS) systems for a variety of real-world applications. Much of the work on TTS has focused on high-resource languages, resulting in implicitly resource-intensive paths to building such systems. The goal of this paper is to provide signposts and points of reference for future low-resource speech synthesis efforts, with insights drawn from the Speech Generation for Indigenous Language Education (SGILE) project. Funded and coordinated by the National Research Council of Canada (NRC), this multiyear, multi-partner project has the goal of producing high-quality text-to-speech systems that support the teaching of Indigenous languages in a variety of educational contexts. We provide background information and motivation for the project, as well as details about our approach and project structure, including results from a multi-day requirements-gathering session. We discuss some of our key challenges, including building models with appropriate controls for educators, improving model data efficiency, and strategies for low-resource transfer learning and evaluation. Finally, we provide a detailed survey of existing speech synthesis software and introduce EveryVoice TTS, a toolkit designed specifically for low-resource speech synthesis.
引用
收藏
页数:30
相关论文
共 50 条