A Multilingual to Polyglot Speech Synthesizer for Indian Languages Using a Voice-Converted Polyglot Speech Corpus

被引:3
|
作者
Vijayalakshmi, P. [1 ]
Ramani, B. [1 ]
Jeeva, M. P. Actlin [1 ]
Nagarajan, T. [1 ]
机构
[1] SSN Coll Engn, Old Mahabalipuram Rd, Madras, Tamil Nadu, India
关键词
Polyglot; Multilingual; HMM; GMM; Voice conversion; SELECTION;
D O I
10.1007/s00034-017-0659-6
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
A multilingual synthesizer synthesizes speech, for any given monolingual or mixed-language text, that is intelligible to human listeners. The necessity for such synthesizer arises in a country like India, where multiple languages coexist. For the current work, multilingual synthesizers are developed using HMM-based speech synthesis technique. However, for a mixed-language text, the synthesized speech shows speaker switching at language switching points which is quite annoying to the listener. This is due to the fact that, speech data used for training is collected for each language from a different (native) speaker. To overcome the speaker switching at language switching points, a polyglot speech synthesizer is developed using polyglot speech corpus (all the speech data in a single speaker's voice). The polyglot speech corpus is obtained using cross-lingual voice conversion (CLVC) technique. In the current work, polyglot synthesizer is developed for five languages namely Tamil, Telugu, Hindi, Malayalam and Indian English. The regional Indian languages considered are acoustically similar, to certain extent, and hence, common phoneset and question set is used to build the synthesizer. Experiments are carried out by developing various bilingual polyglot synthesizers to choose the language (thereby the speaker) that can be considered as target for polyglot synthesizer. The performance of the synthesizers is evaluated subjectively for speaker/language switching using perceptual test and quality using mean opinion score. Speaker identity is evaluated objectively using a GMM-based speaker identification system. Further, the polyglot synthesizer developed using polyglot speech corpus is compared with the adaptation-based polyglot synthesizer, in terms of quality of the synthesized speech and amount of data required for adaptation and voice conversion. It is observed that the performance of the polyglot synthesizer developed using polyglot speech corpus obtained from CLVC technique is better or almost similar to that of the adaptation-based polyglot synthesizer.
引用
收藏
页码:2142 / 2163
页数:22
相关论文
共 40 条
  • [1] A Multilingual to Polyglot Speech Synthesizer for Indian Languages Using a Voice-Converted Polyglot Speech Corpus
    P. Vijayalakshmi
    B. Ramani
    M. P. Actlin Jeeva
    T. Nagarajan
    [J]. Circuits, Systems, and Signal Processing, 2018, 37 : 2142 - 2163
  • [2] Voice Conversion-Based Multilingual to Polyglot Speech Synthesizer for Indian Languages
    Ramani, B.
    Jeeva, Actlin M. P.
    Vijayalakshmi, P.
    Nagarajan, T.
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE OF IEEE REGION 10 (TENCON), 2013,
  • [3] Cross-Lingual Voice Conversion-Based Polyglot Speech Synthesizer for Indian Languages
    Ramani, B.
    Jeeva, Actlin M. P.
    Vijayalakshmi, P.
    Nagarajan, T.
    [J]. 15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 775 - 779
  • [4] Polyglot and Speech Corpus Tools: a system for representing, integrating, and querying speech corpora
    McAuliffe, Michael
    Stengel-Eskin, Elias
    Socolof, Michaela
    Sonderegger, Morgan
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3887 - 3891
  • [5] Huqariq: A Multilingual Speech Corpus of Native Languages of Peru for Speech Recognition
    Zevallos, Rodolfo
    Camacho, Luis
    Melgarejo, Nelsi
    [J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 5029 - 5034
  • [6] Indian Languages Corpus for Speech Recognition
    Basu, Joyanta
    Khan, Soma
    Roy, Rajib
    Saxena, Babita
    Ganguly, Dipankar
    Arora, Sunita
    Arora, Karunesh Kumar
    Bansal, Shweta
    Agrawal, Shyam Sunder
    [J]. 2019 22ND CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (O-COCOSDA), 2019, : 13 - 18
  • [7] New approach to the polyglot speech generation by means of an HMM-based speaker adaptable synthesizer
    Latorre, Javier
    Iwano, Koji
    Furui, Sadaoki
    [J]. SPEECH COMMUNICATION, 2006, 48 (10) : 1227 - 1242
  • [8] Common Voice: A Massively-Multilingual Speech Corpus
    Ardila, Rosana
    Branson, Megan
    Davis, Kelly
    Henretty, Michael
    Kohler, Michael
    Meyer, Josh
    Morais, Reuben
    Saunders, Lindsay
    Tyers, Francis M.
    Weber, Gregor
    [J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 4218 - 4222
  • [9] Mix and Match: An Empirical Study on Training Corpus Composition for Polyglot Text-To-Speech (TTS)
    Zhang, Ziyao
    Falai, Alessio
    Sanchez, Ariadna
    Angelini, Orazio
    Yanagisawa, Kayoko
    [J]. INTERSPEECH 2022, 2022, : 2353 - 2357
  • [10] Multilingual speech mode classification model for Indian languages
    Tripathi, Kumud
    Rao, K. Sreenivasa
    [J]. 2020 TWENTY SIXTH NATIONAL CONFERENCE ON COMMUNICATIONS (NCC 2020), 2020,