Lightweight, Multi-Speaker, Multi-Lingual Indic Text-to-Speech

被引:2
|
作者
Singh, Abhayjeet [1 ]
Nagireddi, Amala [1 ]
Jayakumar, Anjali [1 ]
Deekshitha, G. [1 ]
Bandekar, Jesuraja [1 ]
Roopa, R. [1 ]
Badiger, Sandhya [1 ]
Udupa, Sathvik [1 ]
Kumar, Saurabh [1 ]
Ghosh, Prasanta Kumar [1 ]
Murthy, Hema A. [2 ]
Zen, Heiga [3 ]
Kumar, Pranaw [4 ]
Kant, Kamal [4 ]
Bole, Amol [4 ]
Singh, Bira Chandra [4 ]
Tokuda, Keiichi [5 ]
Hasegawa-Johnson, Mark [6 ]
Olbrich, Philipp [7 ]
机构
[1] Indian Inst Sci IISc, Elect Engn Dept, Bangalore 560012, Karnataka, India
[2] Indian Inst Technol, Dept Comp Sci & Engn, Madras 600036, Tamil Nadu, India
[3] Google, Tokyo 1500002, Japan
[4] Pune Univ Campus, CDAC, Pune 411007, Maharashtra, India
[5] Nagoya Inst Technol, Dept Comp Sci, Nagoya, Aichi 4668555, Japan
[6] Univ Illinois, Dept Elect & Comp Engn, Urbana, IL 61801 USA
[7] Deutsch Gesell Int Zusammenarbeit GIZ GmbH, D-53113 Bonn, Germany
关键词
End-to-end model; data-constrained multi-speaker; model compression; multi-lingual TTS; speech synthesis; text-to-speech (TTS); SELECTION;
D O I
10.1109/OJSP.2024.3379092
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The Lightweight, Multi-speaker, Multi-lingual Indic Text-to-Speech (LIMMITS'23) challenge is organized as part of the ICASSP 2023 Signal Processing Grand Challenge. LIMMITS'23 aims at the development of a lightweight, multi-speaker, multi-lingual Text to Speech (TTS) model using datasets in Marathi, Hindi, and Telugu, with at least 40 hours of data released for each of the male and female voice artists in each language. The challenge encourages the advancement of TTS in Indian Languages as well as the development of techniques involved in TTS data selection and model compression. The 3 tracks of LIMMITS'23 have provided an opportunity for various researchers and practitioners around the world to explore the state-of-the-art techniques in TTS research.
引用
收藏
页码:790 / 798
页数:9
相关论文
共 50 条
  • [1] LIGHT-TTS: LIGHTWEIGHT MULTI-SPEAKER MULTI-LINGUAL TEXT-TO-SPEECH
    Li, Song
    Ouyang, Beibei
    Li, Lin
    Hong, Qingyang
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 8383 - 8387
  • [2] Multi-Lingual Multi-Speaker Text-to-Speech Synthesis for Voice Cloning with Online Speaker Enrollment
    Liu, Zhaoyu
    Mak, Brian
    [J]. INTERSPEECH 2020, 2020, : 2932 - 2936
  • [3] A Controllable Multi-Lingual Multi-Speaker Multi-Style Text-to-Speech Synthesis With Multivariate Information Minimization
    Cheon, Sung Jun
    Choi, Byoung Jin
    Kim, Minchan
    Lee, Hyeonseung
    Kim, Nam Soo
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 55 - 59
  • [4] Transfer Learning for Low-Resource, Multi-Lingual, and Zero-Shot Multi-Speaker Text-to-Speech
    Jeong, Myeonghun
    Kim, Minchan
    Choi, Byoung Jin
    Yoon, Jaesam
    Jang, Won
    Kim, Nam Soo
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 1519 - 1530
  • [5] LIGHTSPEECH: LIGHTWEIGHT NON-AUTOREGRESSIVE MULTI-SPEAKER TEXT-TO-SPEECH
    Li, Song
    Ouyang, Beibei
    Li, Lin
    Hong, Qingyang
    [J]. 2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 499 - 506
  • [6] Multi-speaker Emotional Text-to-speech Synthesizer
    Cho, Sungjae
    Lee, Soo-Young
    [J]. INTERSPEECH 2021, 2021, : 2337 - 2338
  • [7] Cross-lingual, Multi-speaker Text-To-Speech Synthesis Using Neural Speaker Embedding
    Chen, Mengnan
    Chen, Minchuan
    Liang, Shuang
    Ma, Jun
    Chen, Lei
    Wang, Shaojun
    Xiao, Jing
    [J]. INTERSPEECH 2019, 2019, : 2105 - 2109
  • [8] ECAPA-TDNN for Multi-speaker Text-to-speech Synthesis
    Xue, Jinlong
    Deng, Yayue
    Han, Yichen
    Li, Ya
    Sun, Jianqing
    Liang, Jiaen
    [J]. 2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 230 - 234
  • [9] Deep Voice 2: Multi-Speaker Neural Text-to-Speech
    Arik, Sercan O.
    Diamos, Gregory
    Gibiansky, Andrew
    Miller, John
    Peng, Kainan
    Ping, Wei
    Raiman, Jonathan
    Zhou, Yanqi
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [10] The paradigm for creating multi-lingual text-to-speech voice databases
    Chu, Min
    Zhao, Yong
    Chen, Yining
    Wang, Lijuan
    Soong, Frank
    [J]. CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2006, 4274 : 736 - +