End-to-end Speech Synthesis for Tibetan Lhasa Dialect

被引:2
|
作者
Luo, Lisai [1 ]
Li, Guanyu [1 ]
Gong, Chunwei [1 ]
Ding, Hailan [1 ]
机构
[1] Northwest Minzu Univ, Key Lab Natl Language Intelligent Proc Gansu Prov, Lanzhou, Gansu, Peoples R China
关键词
D O I
10.1088/1742-6596/1187/5/052061
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Speech synthesis for Tibetan Lhasa dialect is implemented on the basis of an end-toend novel speech synthesis framework, Tacotron. The training transcript has used the phoneme list transcribed from Tibetan characters, and feature parameters were extracted from the mel-spectrogram. Then the model is trained by the mapping of character to spectrum. Tibetan language is an important minority language of the Chinese nation, but there is little research on Tibetan language at present. The experimental results were compared with traditional speech synthesis methods, with the audio quality significantly better than that of the traditional GMM-HMM in both naturalness and rhythm. It provides a crucial reference for the later research methods of Tibetan language and promotes the development of Tibetan language research.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] Lhasa-Tibetan Speech Synthesis Using End-to-End Model
    Zhao, Yue
    Hu, Panhua
    Xu, Xiaona
    Wu, Licheng
    Li, Xiali
    [J]. IEEE ACCESS, 2019, 7 : 140305 - 140311
  • [2] End-to-End Speech Synthesis for Tibetan Multidialect
    Xu, Xiaona
    Yang, Li
    Zhao, Yue
    Wang, Hui
    [J]. COMPLEXITY, 2021, 2021
  • [3] Effective Training End-to-End ASR systems for Low-resource Lhasa Dialect of Tibetan Language
    Pan, Lixin
    Li, Sheng
    Wang, Longbiao
    Dang, Jianwu
    [J]. 2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 1152 - 1156
  • [4] A Streaming End-to-End Speech Recognition Approach Based on WeNet for Tibetan Amdo Dialect
    Wang, Chao
    Wen, Yao
    Lhamo, Phurba
    Tashi, Nyima
    [J]. 2022 5TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND NATURAL LANGUAGE PROCESSING, MLNLP 2022, 2022, : 317 - 322
  • [5] End-to-end Tibetan Ando dialect speech recognition based on hybrid CTC/attention architecture
    Sun, Jingwen
    Zhou, Gang
    Yang, Hongwu
    Wang, Man
    [J]. 2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 628 - 632
  • [6] Dialect-Aware Modeling for End-to-End Japanese Dialect Speech Recognition
    Imaizumi, Ryo
    Masumura, Ryo
    Shiota, Sayaka
    Kiya, Hitoshi
    [J]. 2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 297 - 301
  • [7] End-to-end Tibetan Speech Synthesis Based on Phones and Semi-syllables
    Li, Guanyu
    Luo, Lisai
    Gong, Chunwei
    Lv, Shiliang
    [J]. 2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 1294 - 1297
  • [8] Domain Expansion for End-to-End Speech Recognition: Applications for Accent/Dialect Speech
    Ghorbani, Shahram
    Hansen, John H. L.
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 762 - 774
  • [9] End-to-End Binaural Speech Synthesis
    Huang, Wen-Chin
    Markovic, Dejan
    Gebru, Israel D.
    Menon, Anjali
    Richard, Alexander
    [J]. INTERSPEECH 2022, 2022, : 1218 - 1222
  • [10] IMPROVING UNSUPERVISED STYLE TRANSFER IN END-TO-END SPEECH SYNTHESIS WITH END-TO-END SPEECH RECOGNITION
    Liu, Da-Rong
    Yang, Chi-Yu
    Wu, Szu-Lin
    Lee, Hung-Yi
    [J]. 2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 640 - 647