INTELLIGENT SYNTHESIS TECHNOLOGY OF CHINESE SPEECH FOR SPEECH NAVIGATION

被引:0
|
作者
Tuerxun, Kade [1 ]
机构
[1] Xinjiang Univ Finance & Econ, Sch Chinese Language & Culture, Urumqi 830012, Peoples R China
来源
关键词
ALBert; Chinese speech synthesis; FastSpeech2; Transformer structure; GAN; FEATURE-EXTRACTION;
D O I
10.2316/J.2024.206-1050
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
To improve the speech synthesis (SS) technology in speech navigation APP, and to improve its SS quality and synthesis speed, the study proposed ALBert multi-syllable disambiguation method and used it in text-phoneme conversion processing. And the study also constructed a non-autoregressive Chinese SS technique based on Transformer. The research indicates that ALBert possesses the optimum disambiguation outcome, with an average accuracy of 94.2% for its polyphonic character disambiguation, 83.4% for maximum entropy model (MEM) algorithm, 83.7% for tree- guided transformation-based learning (TGTBL) algorithm, 84.3% for pypinyin tool library, and 87.1% for conditional random fields (CRF). Among the common polysyllabic words, "chao" has the highest recognition accuracy of 98.5%, and "wei" has the highest frequency of 11%. The highest performance of the FastSpeech2-GAN model is achieved at 100 k training steps, with a mean opinion score (MOS) of 3.94 and a Mel Cepstral distance (MCD) of 2.8911. The MOS scores and MCD values of the SS models are compared. The MOS score of FastSpeech2-GAN model is 3.94, and the MCD value is 2.8911, followed by FastSpeech2 model with MOS score of 3.88 and MCD value of 2.9168. 0.011, and FastSpeech2 has the same real-time rate. The studied improved Transformer-based nonautoregressive Chinese SS technology has made some progress in SS speed and SS quality.
引用
收藏
页码:504 / 514
页数:11
相关论文
共 50 条
  • [1] Speech Recognition Technology Applied to Intelligent Mobile Navigation System
    WANG Mi GUO Bingxuan LI Deren GONG Jianya
    Geo-spatial Information Science, 2002, (04) : 37 - 40
  • [2] Intelligent vehicle autonomous navigation control method based on speech recognition technology
    Bai, Xue
    Zhao, Yu
    Lv, Donghui
    Hu, Haichao
    Du, Huiqi
    International Journal of Product Development, 2024, 28 (03) : 147 - 164
  • [3] Embedding Speech Technology into Intelligent Tutoring Systems Using the CloudCAST Speech Technology Platform
    Coy, Andre
    Green, Phil
    Cunningham, Stuart
    Christensen, Heidi
    Atria, Jose Joaquin
    Rudzicz, Frank
    Malavasi, Massimiliano
    Desideri, Lorenzo
    INTELLIGENT TUTORING SYSTEMS, ITS 2018, 2018, 10858 : 421 - 424
  • [4] Application of Intelligent Speech Synthesis Technology Assisted by Mobile Intelligent Terminal in Foreign Language Teaching
    Zhang, Zhehua
    Mathematical Problems in Engineering, 2022, 2022
  • [5] Application of Intelligent Speech Synthesis Technology Assisted by Mobile Intelligent Terminal in Foreign Language Teaching
    Zhang, Zhehua
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2022, 2022
  • [6] An overview of speech synthesis technology
    Yin Zhigang
    2018 EIGHTH INTERNATIONAL CONFERENCE ON INSTRUMENTATION AND MEASUREMENT, COMPUTER, COMMUNICATION AND CONTROL (IMCCC 2018), 2018, : 522 - 526
  • [7] Semantic-based speech synthesis - Survey and perspective on the speech synthesis technology
    Zhu, Wei-Bin
    Lu, Shi-Nan
    Beijing Ligong Daxue Xuebao/Transaction of Beijing Institute of Technology, 2007, 27 (05): : 408 - 412
  • [8] Speech-to-Visual Speech Synthesis Using Chinese Visual Triphone
    Zhao, Hui
    Shen, Yamin
    Tang, Chaojing
    2ND IEEE INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER CONTROL (ICACC 2010), VOL. 2, 2010, : 241 - 245
  • [9] Chinese Speech synthesis by rule system
    London Univ, London, United Kingdom
    Shengxue Xuebao, 2 (146-155):
  • [10] Assisted Robot Navigation based on Speech Recognition and Synthesis
    Alves, Silas F. R.
    Silva, Ivan N.
    Ranieri, Caetano M.
    Ferasoli Filho, Humberto
    5TH ISSNIP-IEEE BIOSIGNALS AND BIOROBOTICS CONFERENCE (2014): BIOSIGNALS AND ROBOTICS FOR BETTER AND SAFER LIVING, 2014, : 231 - 235