Swahili Speech Dataset Development and Improved Pre-training Method for Spoken Digit Recognition

被引:1
|
作者
Kivaisi, Alexander R. [1 ]
Zhao, Qingjie [1 ]
Mbelwa, Jimmy T. [2 ]
机构
[1] Beijing Inst Technol, 5 South St Zhongguancun, Beijing 100081, Peoples R China
[2] Univ Dar Es Salaam, Dept Comp Sci & Engn, Ali Hassan Mwinyi Rd POB 33335, Dar Es Salaam, Tanzania
关键词
Swahili language; pre-training; cross-lingual; multi-lingual; low-resource language; spoken digit recognition; convolutional neural network;
D O I
10.1145/3597494
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech dataset is an essential component in building commercial speech applications. However, low-resource languages such as Swahili lack such a resource that is vital for spoken digit recognition. For languages where such resources exist, they are usually insufficient. Thus, pre-training methods have been used with external resources to improve continuous speech recognition. However, to the best of our knowledge, no study has investigated the effect of pre-training methods specifically for spoken digit recognition. This study aimed at addressing these problems. First, we developed a Swahili spoken digit dataset for Swahili spoken digit recognition. Then, we investigated the effect of cross-lingual and multi-lingual pre-training methods on spoken digit recognition. Finally, we proposed an effective language-independent pre-training method for spoken digit recognition. The proposed method has the advantage of incorporating target language data during the pre-training stage that leads to an optimal solution when using less training data. Experiments on Swahili (being developed), English, and Gujarati datasets show that our method achieves better performance compared with all the baselines listed in this study.
引用
收藏
页数:24
相关论文
共 50 条
  • [21] PERFORMANCE-EFFICIENCY TRADE-OFFS IN UNSUPERVISED PRE-TRAINING FOR SPEECH RECOGNITION
    Wu, Felix
    Kim, Kwangyoun
    Pan, Jing
    Han, Kyu J.
    Weinberger, Kilian Q.
    Artzi, Yoav
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7667 - 7671
  • [22] Neural speech enhancement with unsupervised pre-training and mixture training
    Hao, Xiang
    Xu, Chenglin
    Xie, Lei
    [J]. NEURAL NETWORKS, 2023, 158 : 216 - 227
  • [23] In Defense of Image Pre-Training for Spatiotemporal Recognition
    Li, Xianhang
    Wang, Huiyu
    Wei, Chen
    Mei, Jieru
    Yuille, Alan
    Zhou, Yuyin
    Xie, Cihang
    [J]. COMPUTER VISION, ECCV 2022, PT XXV, 2022, 13685 : 675 - 691
  • [24] An improved wav2vec 2.0 pre-training approach using enhanced local dependency modeling for speech recognition
    Zhu, Qiu-shi
    Zhang, Jie
    Wu, Ming-hui
    Fang, Xin
    Dai, Li-Rong
    [J]. INTERSPEECH 2021, 2021, : 4334 - 4338
  • [25] Speech-Text Dialog Pre-training for Spoken Dialog Understanding with Explicit Cross-Modal Alignment
    Yu, Tianshu
    Gao, Haoyu
    Lin, Ting-En
    Yang, Min
    Wu, Yuchuan
    Ma, Wentao
    Wang, Chao
    Huang, Fei
    Li, Yongbin
    [J]. PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 7900 - 7913
  • [26] Texture aware autoencoder pre-training and pairwise learning refinement for improved iris recognition
    Chakraborty, Manashi
    Chakraborty, Aritri
    Biswas, Prabir Kumar
    Mitra, Pabitra
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (16) : 25381 - 25401
  • [27] GENERATIVE PRE-TRAINING FOR SPEECH WITH AUTOREGRESSIVE PREDICTIVE CODING
    Chung, Yu-An
    Glass, James
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3497 - 3501
  • [28] Texture aware autoencoder pre-training and pairwise learning refinement for improved iris recognition
    Manashi Chakraborty
    Aritri Chakraborty
    Prabir Kumar Biswas
    Pabitra Mitra
    [J]. Multimedia Tools and Applications, 2023, 82 : 25381 - 25401
  • [29] EXPLORING PRE-TRAINING WITH ALIGNMENTS FOR RNN TRANSDUCER BASED END-TO-END SPEECH RECOGNITION
    Hu, Hu
    Zhao, Rui
    Li, Jinyu
    Lu, Liang
    Gong, Yifan
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7079 - 7083
  • [30] Improved OOD Generalization via Adversarial Training and Pre-training
    Yi, Mingyangi
    Hou, Lu
    Sun, Jiacheng
    Shang, Lifeng
    Jiang, Xin
    Liu, Qun
    Ma, Zhi-Ming
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139