An open speech resource for Tibetan multi-dialect and multitask recognition

被引:6
|
作者
Zhao, Yue [1 ]
Xu, Xiaona [1 ]
Yue, Jianjian [1 ]
Song, Wei [1 ]
Li, Xiali [1 ]
Wu, Licheng [1 ]
Ji, Qiang [2 ]
机构
[1] Minzu Univ China, Sch Informat & Engn, Beijing, Peoples R China
[2] Rensselaer Polytech Inst, Dept Elect Comp & Syst Engn, Troy, NY 12180 USA
关键词
Tibetan language; multi-dialect speech recognition; multitask learning; speech corpus; FEATURES;
D O I
10.1504/IJCSE.2020.107351
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
This paper introduces a Tibetan multi-dialect data resource for multitask speech research. It can be used for Tibetan multi-dialect speech recognition, Tibetan speaker recognition, Tibetan dialect identification, and Tibetan speech synthesis. The resource consists of 30 hours Lhasa-u-Tsang dialect; 8.7 hours Kham dialect, including 3.4 hours Yushu dialect, 3.3 hours Dege dialect and 2 hours Changdu dialect; 10 hours Amdo pastoral dialect. Other resources are also provided for Lhasa-u-Tsang dialect including phoneme set, pronunciation dictionary and the codes for constructing the Lhasa-u-Tsang speech recognition baseline system. Meanwhile, for Tibetan multi-dialect and multitask speech recognition, the codes and recognition results based on WaveNet-connectionist temporal classification (WaveNet-CTC) are provided. All the resources are free for researchers and publicly available, which effectively compensates for the shortage of public Tibetan multi-dialect speech resources in order to promote the development of Tibetan multi-dialect speech processing technology.
引用
收藏
页码:297 / 304
页数:8
相关论文
共 50 条
  • [1] Tibetan Multi-Dialect Speech and Dialect Identity Recognition
    Zhao, Yue
    Yue, Jianjian
    Song, Wei
    Xu, Xiaona
    Li, Xiali
    Wu, Licheng
    Ji, Qiang
    [J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2019, 60 (03): : 1223 - 1235
  • [2] Multi-Dialect Arabic Speech Recognition
    Ali, Abbas Raza
    [J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [3] Exploring task-diverse meta-learning on Tibetan multi-dialect speech recognition
    Liu, Yigang
    Zhao, Yue
    Xu, Xiaona
    Xu, Liang
    Zhang, Xubei
    Ji, Qiang
    [J]. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2024, 2024 (01):
  • [4] Global RNN Transducer Models For Multi-dialect Speech Recognition
    Fukuda, Takashi
    Thomas, Samuel
    Suzuki, Masayuki
    Kurata, Gakuto
    Saon, George
    Kingsbury, Brian
    [J]. INTERSPEECH 2022, 2022, : 3138 - 3142
  • [5] Chinese Multi-Dialect Speech Recognition Based on Instruction Tuning
    Ding, Timin
    Sun, Kai
    Zhang, Xu
    Yu, Jian
    Huang, Degen
    [J]. FOURTH SYMPOSIUM ON PATTERN RECOGNITION AND APPLICATIONS, SPRA 2023, 2024, 13162
  • [6] A HIGHLY ADAPTIVE ACOUSTIC MODEL FOR ACCURATE MULTI-DIALECT SPEECH RECOGNITION
    Yoo, Sanghyun
    Song, Inchul
    Bengio, Yoshua
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5716 - 5720
  • [7] MULTI-DIALECT SPEECH RECOGNITION WITH A SINGLE SEQUENCE-TO-SEQUENCE MODEL
    Li, Bo
    Sainath, Tara N.
    Sim, Khe Chai
    Bacchiani, Michiel
    Weinstein, Eugene
    Nguyen, Patrick
    Chen, Zhifeng
    Wu, Yonghui
    Rao, Kanishka
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4749 - 4753
  • [8] MULTI-DIALECT SPEECH RECOGNITION IN ENGLISH USING ATTENTION ON ENSEMBLE OF EXPERTS
    Das, Amit
    Kumar, Kshitiz
    Wu, Jian
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6244 - 6248
  • [9] End-to-end Japanese Multi-dialect Speech Recognition and Dialect Identification with Multi-task Learning
    Imaizumi, Ryo
    Masumura, Ryo
    Shiota, Sayaka
    Kiya, Hitoshi
    [J]. APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2022, 11 (01)
  • [10] Multi-task Learning with Auxiliary Cross-attention Transformer for Low-Resource Multi-dialect Speech Recognition
    Dan, Zhengjia
    Zhao, Yue
    Bi, Xiaojun
    Wu, Licheng
    Ji, Qiang
    [J]. NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2022, PT I, 2022, 13551 : 107 - 118