A Flexible Architecture for Urdu Phonemes-Based Concatenative Speech Synthesis

被引:0
|
作者
Ahmad, Muhammad Rizwan [1 ]
Arshad, Muhammad Junaid [1 ]
机构
[1] Univ Engn & Technol, Dept Comp Sci, Lahore, Pakistan
关键词
Articulatory; Text-to-Speech; Formant; Concatenative; Natural Language Processing; Waveforms; Speech Units; Phonemes; Speech Synthesis;
D O I
10.22581/muet1982.1603.07
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
TTS (Text-to-Speech) synthesis systems are extensively used across the world to intensify the accessibility of information and to make it possible for the handicapped to be involved directly with computers to get the benefits from this high technology revolution. Various TTS synthesis techniques have been used with their own advantages and limitations. There is not a concatenative synthesis strategy based architecture for Urdu TTS synthesis system for handling the homographs and to avoid the unnatural robot sounding speech produced due the use of di-phones. In this paper, we propose a flexible architecture for Urdu TTS synthesis system that uses concatenative synthesis strategy because this approach has the ability to join together the small corpus of speech to generate natural and intelligible sound. The main aspiration of this research is to disambiguate the homographs in the Urdu language and to avoid the unnatural robot sounding speech. Finally, the effectiveness of the system is tested in terms of intelligibility and acceptability on word and sentence level. The intelligibility rate is near to 80% and 65% while acceptability rate for the naturalness is 95% (75% natural, 20% acceptable).
引用
收藏
页码:373 / 380
页数:8
相关论文
共 50 条
  • [41] Perceptual and objective detection of discontinuities in concatenative speech synthesis
    Stylianou, Y
    Syrdal, AK
    2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING - VOL IV: SIGNAL PROCESSING FOR COMMUNICATIONS; VOL V: SIGNAL PROCESSING EDUCATION SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO & ELECTROACOUSTICS; VOL VI: SIGNAL PROCESSING THEORY & METHODS STUDENT FORUM, 2001, : 837 - 840
  • [42] Six Approaches to Limited Domain Concatenative Speech Synthesis
    Utama, Robert J.
    Syrdal, Ann K.
    Conkie, Alistair
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2058 - +
  • [43] Challenges and rewards in using parametric or concatenative speech synthesis
    Henton C.
    International Journal of Speech Technology, 2002, 5 (02) : 117 - 131
  • [44] Removing linear phase mismatches in concatenative speech synthesis
    Stylianou, Y
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2001, 9 (03): : 232 - 239
  • [45] Statistical prediction of spectral discontinuities of speech in concatenative synthesis
    Pablo Trivino, Manuel
    Alias, Francesc
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2008, (40): : 67 - 74
  • [46] Archisegment-based letter-to-phone conversion for concatenative speech synthesis in Portuguese
    Albano, EC
    Moreira, AA
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1708 - 1711
  • [47] PERCEPTUAL CLUSTERING BASED UNIT SELECTION OPTIMIZATION FOR CONCATENATIVE TEXT-TO-SPEECH SYNTHESIS
    Jiang, Tao
    Wu, Zhiyong
    Jia, Jia
    Cai, Lianhong
    2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, 2012, : 64 - 68
  • [48] A Preselection Method Based on Cost Degradation from the Optimal Sequence for Concatenative Speech Synthesis
    Nishizawa, Nobuyuki
    Kawai, Hisashi
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1309 - 1312
  • [49] Modified Rule-Based Concatenative Technique for Intelligible Speech Synthesis in Indian Languages
    Panda, Soumya Priyadarsini
    Nayak, Ajit Kumar
    ADVANCED SCIENCE LETTERS, 2016, 22 (02) : 557 - 563
  • [50] The sound database formation for the allophone-based model for English concatenative speech synthesis
    Evgrafova, K
    TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2005, 3658 : 219 - 225