Swahili Speech Dataset Development and Improved Pre-training Method for Spoken Digit Recognition

被引：1

作者：

Kivaisi, Alexander R. ^{[1
]}

Zhao, Qingjie ^{[1
]}

Mbelwa, Jimmy T. ^{[2
]}

机构：

[1] Beijing Inst Technol, 5 South St Zhongguancun, Beijing 100081, Peoples R China

[2] Univ Dar Es Salaam, Dept Comp Sci & Engn, Ali Hassan Mwinyi Rd POB 33335, Dar Es Salaam, Tanzania

来源：

ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING | 2023年 / 22卷 / 07期

关键词：

Swahili language; pre-training; cross-lingual; multi-lingual; low-resource language; spoken digit recognition; convolutional neural network;

D O I：

10.1145/3597494

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Speech dataset is an essential component in building commercial speech applications. However, low-resource languages such as Swahili lack such a resource that is vital for spoken digit recognition. For languages where such resources exist, they are usually insufficient. Thus, pre-training methods have been used with external resources to improve continuous speech recognition. However, to the best of our knowledge, no study has investigated the effect of pre-training methods specifically for spoken digit recognition. This study aimed at addressing these problems. First, we developed a Swahili spoken digit dataset for Swahili spoken digit recognition. Then, we investigated the effect of cross-lingual and multi-lingual pre-training methods on spoken digit recognition. Finally, we proposed an effective language-independent pre-training method for spoken digit recognition. The proposed method has the advantage of incorporating target language data during the pre-training stage that leads to an optimal solution when using less training data. Experiments on Swahili (being developed), English, and Gujarati datasets show that our method achieves better performance compared with all the baselines listed in this study.

引用

页数：24

共 50 条

[21] PERFORMANCE-EFFICIENCY TRADE-OFFS IN UNSUPERVISED PRE-TRAINING FOR SPEECH RECOGNITION
Wu, Felix
Kim, Kwangyoun
Pan, Jing
Han, Kyu J.
Weinberger, Kilian Q.
Artzi, Yoav
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7667 - 7671
[22] Neural speech enhancement with unsupervised pre-training and mixture training
Hao, Xiang
Xu, Chenglin
Xie, Lei
[J]. NEURAL NETWORKS, 2023, 158 : 216 - 227
[23] In Defense of Image Pre-Training for Spatiotemporal Recognition
Li, Xianhang
Wang, Huiyu
Wei, Chen
Mei, Jieru
Yuille, Alan
Zhou, Yuyin
Xie, Cihang
[J]. COMPUTER VISION, ECCV 2022, PT XXV, 2022, 13685 : 675 - 691
[24] An improved wav2vec 2.0 pre-training approach using enhanced local dependency modeling for speech recognition
Zhu, Qiu-shi
Zhang, Jie
Wu, Ming-hui
Fang, Xin
Dai, Li-Rong
[J]. INTERSPEECH 2021, 2021, : 4334 - 4338
[25] Speech-Text Dialog Pre-training for Spoken Dialog Understanding with Explicit Cross-Modal Alignment
Yu, Tianshu
Gao, Haoyu
Lin, Ting-En
Yang, Min
Wu, Yuchuan
Ma, Wentao
Wang, Chao
Huang, Fei
Li, Yongbin
[J]. PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 7900 - 7913
[26] Texture aware autoencoder pre-training and pairwise learning refinement for improved iris recognition
Chakraborty, Manashi
Chakraborty, Aritri
Biswas, Prabir Kumar
Mitra, Pabitra
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (16) : 25381 - 25401
[27] GENERATIVE PRE-TRAINING FOR SPEECH WITH AUTOREGRESSIVE PREDICTIVE CODING
Chung, Yu-An
Glass, James
[J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3497 - 3501
[28] Texture aware autoencoder pre-training and pairwise learning refinement for improved iris recognition
Manashi Chakraborty
Aritri Chakraborty
Prabir Kumar Biswas
Pabitra Mitra
[J]. Multimedia Tools and Applications, 2023, 82 : 25381 - 25401
[29] EXPLORING PRE-TRAINING WITH ALIGNMENTS FOR RNN TRANSDUCER BASED END-TO-END SPEECH RECOGNITION
Hu, Hu
Zhao, Rui
Li, Jinyu
Lu, Liang
Gong, Yifan
[J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7079 - 7083
[30] Improved OOD Generalization via Adversarial Training and Pre-training
Yi, Mingyangi
Hou, Lu
Sun, Jiacheng
Shang, Lifeng
Jiang, Xin
Liu, Qun
Ma, Zhi-Ming
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139

← 1 2 3 4 5 →