Development of robotic voice conversion for RIBO using text-to-speech synthesis

被引:0
|
作者
Hossain, Md. Jakir [1 ]
Al Amin, Sayed Mahmud [1 ]
Islam, Md. Saiful [1 ]
Marium-E-Jannat [1 ]
机构
[1] Shahjalal Univ Sci & Technol Sylhet, Dept Comp Sci & Engn, Sylhet, Bangladesh
关键词
TTS; RIBO; Diode; Ring modulator; VCA; Transformer; RF;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
RIBO is the first social interaction robot in Bangladesh. This robot is designed and developed by 'ROBO SUST' team of Shahjalal University of Science and Technology. RIBO is able to hands and eyes ups and downs, can walk very slowly and can speak some Bengali recorded sentences. Now the 'ROBO SUST' team is trying to develop the RIBO so that it can communicate with human. One of the parts to communicate with human is convert bengali text to bengali speech in robotic voice. In this article, we propose a method which will convert bengali text to speech in robotic voice using google text to speech system and ring modulator. There are existed some text to speech synthesizer system which can convert bengali text to bengali speech. Among these TTS synthesizer system google TTS system for bengali is better. Hence, we use google text to speech system to produce bengali speech from any bengali written text. Google TTS synthesizer system produces speech as audio object file which can be converted to .mp3 file. Then we modify this .mp3 file using the characteristics of diode and ring modulator concept to get machine voice. After changing pitch and speed of this machine voice we get our final robotic voice which will be used in RIBO as his voice.
引用
收藏
页码:422 / 425
页数:4
相关论文
共 50 条
  • [21] Text and Speech Corpora for Text-To-Speech Synthesis of Tales
    Doukhan, David
    Rosset, Sophie
    Rilliard, Albert
    d'Alessandro, Christophe
    Adda-Decker, Martine
    LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 1003 - 1010
  • [22] Multilingual text-to-speech synthesis
    Black, AW
    Lenzo, KA
    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL III, PROCEEDINGS: IMAGE AND MULTIDIMENSIONAL SIGNAL PROCESSING SPECIAL SESSIONS, 2004, : 761 - 764
  • [23] An introduction to text-to-speech synthesis
    Fitzpatrick, E
    COMPUTATIONAL LINGUISTICS, 1998, 24 (02) : 322 - 323
  • [24] Improving text-to-speech synthesis
    Tatham, M
    Lewis, E
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1856 - 1859
  • [25] Issues in text-to-speech synthesis
    Macchi, M
    IEEE INTERNATIONAL JOINT SYMPOSIA ON INTELLIGENCE AND SYSTEMS - PROCEEDINGS, 1998, : 318 - 325
  • [26] Text-to-Speech Conversion Using Concatenative Approach for Gujarati Language
    Narvani, Vishal
    Arolkar, Harshal
    SMART TRENDS IN COMPUTING AND COMMUNICATIONS, VOL 5, SMARTCOM 2024, 2024, 949 : 183 - 193
  • [27] Prosody generation in text-to-speech conversion using dependency graphs
    Lindstrom, A
    Bretan, I
    Ljungqvist, M
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1341 - 1344
  • [28] VOICE FILTER: FEW-SHOT TEXT-TO-SPEECH SPEAKER ADAPTATION USING VOICE CONVERSION AS A POST-PROCESSING MODULE
    Gabrys, Adam
    Huybrechts, Goeric
    Ribeiro, Manuel Sam
    Chien, Chung-Ming
    Roth, Julian
    Comini, Giulia
    Barra-Chicote, Roberto
    Perz, Bartek
    Lorenzo-Trueba, Jaime
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7902 - 7906
  • [29] Speech-Emotion Control for Text-to-Speech in Spoken Dialogue Systems Using Voice Conversion and x-vector Embedding
    Kohara, Shunichi
    Abe, Masanobu
    Hara, Sunao
    2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 2280 - 2286
  • [30] Objective evaluation of grapheme to phoneme conversion for text-to-speech synthesis in French
    Yvon, F
    de Mareuil, PB
    d'Alessandro, C
    Auberge, V
    Bagein, M
    Bailly, G
    Bechet, F
    Foukia, S
    Goldman, JF
    Keller, E
    O'Shaughnessy, D
    Pagel, V
    Sannier, F
    Veronis, J
    Zellner, B
    COMPUTER SPEECH AND LANGUAGE, 1998, 12 (04): : 393 - 410