Swahili Speech Dataset Development and Improved Pre-training Method for Spoken Digit Recognition

被引:1
|
作者
Kivaisi, Alexander R. [1 ]
Zhao, Qingjie [1 ]
Mbelwa, Jimmy T. [2 ]
机构
[1] Beijing Inst Technol, 5 South St Zhongguancun, Beijing 100081, Peoples R China
[2] Univ Dar Es Salaam, Dept Comp Sci & Engn, Ali Hassan Mwinyi Rd POB 33335, Dar Es Salaam, Tanzania
关键词
Swahili language; pre-training; cross-lingual; multi-lingual; low-resource language; spoken digit recognition; convolutional neural network;
D O I
10.1145/3597494
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech dataset is an essential component in building commercial speech applications. However, low-resource languages such as Swahili lack such a resource that is vital for spoken digit recognition. For languages where such resources exist, they are usually insufficient. Thus, pre-training methods have been used with external resources to improve continuous speech recognition. However, to the best of our knowledge, no study has investigated the effect of pre-training methods specifically for spoken digit recognition. This study aimed at addressing these problems. First, we developed a Swahili spoken digit dataset for Swahili spoken digit recognition. Then, we investigated the effect of cross-lingual and multi-lingual pre-training methods on spoken digit recognition. Finally, we proposed an effective language-independent pre-training method for spoken digit recognition. The proposed method has the advantage of incorporating target language data during the pre-training stage that leads to an optimal solution when using less training data. Experiments on Swahili (being developed), English, and Gujarati datasets show that our method achieves better performance compared with all the baselines listed in this study.
引用
收藏
页数:24
相关论文
共 50 条
  • [31] A kind of continuous digit speech recognition method
    Cao, WM
    [J]. ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS II, 2005, 187 : 213 - 222
  • [32] Pre-training on High-Resource Speech Recognition Improves Low-Resource Speech-to-Text Translation
    Bansal, Sameer
    Kamper, Herman
    Livescu, Karen
    Lopez, Adam
    Goldwater, Sharon
    [J]. 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 58 - 68
  • [33] Image-Text Pre-Training for Logo Recognition
    Hubenthal, Mark
    Kumar, Suren
    [J]. 2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 1145 - 1154
  • [34] Few-Shot Dataset Distillation via Translative Pre-Training
    Liu, Songhua
    Wang, Xinchao
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 18608 - 18618
  • [35] An Efficient Method for Removing Deletion Errors in Quickly-spoken Connected Mandarin Digit String Speech Recognition
    Guo, Chunyi
    Li, Runzhi
    Fan, Ming
    Liu, Kejun
    [J]. THIRD INTERNATIONAL SYMPOSIUM ON ELECTRONIC COMMERCE AND SECURITY WORKSHOPS (ISECS 2010), 2010, : 376 - 379
  • [36] Multimodal N-best List Rescoring with Weakly Supervised Pre-training in Hybrid Speech Recognition
    Song, Yuanfeng
    Huang, Xiaoling
    Zhao, Xuefang
    Jiang, Di
    Wong, Raymond Chi-Wing
    [J]. 2021 21ST IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2021), 2021, : 1336 - 1341
  • [37] SpeechLM: Enhanced Speech Pre-Training With Unpaired Textual Data
    Zhang, Ziqiang
    Chen, Sanyuan
    Zhou, Long
    Wu, Yu
    Ren, Shuo
    Liu, Shujie
    Yao, Zhuoyuan
    Gong, Xun
    Dai, Lirong
    Li, Jinyu
    Wei, Furu
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 2177 - 2187
  • [38] Language Model Pre-training Method in Machine Translation Based on Named Entity Recognition
    Li, Zhen
    Qu, Dan
    Xie, Chaojie
    Zhang, Wenlin
    Li, Yanxia
    [J]. INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2020, 29 (7-8)
  • [39] Face Sketch Recognition Method Based on Cross-Batch Memory Pre-Training
    Shao, Yuying
    Cao, Lin
    Kang, Jun
    Song, Peiran
    Du, Kangning
    Guo, Yanan
    [J]. Computer Engineering and Applications, 59 (03): : 175 - 183
  • [40] Curriculum Pre-training for End-to-End Speech Translation
    Wang, Chengyi
    Wu, Yu
    Liu, Shujie
    Zhou, Ming
    Yang, Zhenglu
    [J]. 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 3728 - 3738