Exploring data augmentation for Amazigh speech recognition with convolutional neural networks

被引：0

作者：

Hossam Boulal ^{[1
]}

Farida Bouroumane ^{[1
]}

Mohamed Hamidi ^{[2
]}

Jamal Barkani ^{[1
]}

Mustapha Abarkan ^{[1
]}

机构：

[1] FP Taza,LSI Laboratory

[2] USMBA University,Team of Modeling and Scientific Computing

[3] FPN,undefined

[4] UMP,undefined

来源：

International Journal of Speech Technology | 2025年 / 28卷 / 1期

关键词：

Speech recognition; Data augmentation; Deep learning; Feature extraction; Amazigh digits;

D O I：

10.1007/s10772-024-10164-y

中图分类号：

学科分类号：

摘要：

In the field of speech recognition, enhancing accuracy is paramount for diverse linguistic communities. Our study addresses this necessity, focusing on improving Amazigh speech recognition through the implementation of three distinct data augmentation methods: Audio Augmentation, FilterBank Augmentation, and SpecAugment. Leveraging Convolutional Neural Networks (CNNs) for speech recognition, we utilize Mel Spectrograms extracted from audio files. The study specifically targets the recognition of the initial ten Amazigh digits. We conducted experiments with a speaker-independent approach involving 42 participants. A total of 27 experiments were conducted, utilizing both original and augmented data. Among the different CNN models employed, the VGG19 model showcased significant promise. Our results demonstrate a maximum accuracy of 95.66%. Furthermore, the most notable improvement achieved through data augmentation was 4.67%. These findings signify a substantial enhancement in speech recognition accuracy, indicating the efficacy of the proposed methods.

引用

页码：53 / 65

页数：12

共 50 条

[1] Effective Data Augmentation Techniques for Arabic Speech Emotion Recognition Using Convolutional Neural Networks
Bouchelligua, Wided
Al-Dayil, Reham
Algaith, Areej
APPLIED SCIENCES-BASEL, 2025, 15 (04):
[2] Convolutional Neural Networks for Speech Recognition
Abdel-Hamid, Ossama
Mohamed, Abdel-Rahman
Jiang, Hui
Deng, Li
Penn, Gerald
Yu, Dong
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (10) : 1533 - 1545
[3] Deep Convolutional Neural Networks and Data Augmentation for Acoustic Event Recognition
Takahashi, Naoya
Gygli, Michael
Pfister, Beat
Van Goole, Luc
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2982 - 2986
[4] Continuous speech recognition by convolutional neural networks
Zhang, Qing-Qing
Liu, Yong
Pan, Jie-Lin
Yan, Yong-Hong
Gongcheng Kexue Xuebao/Chinese Journal of Engineering, 2015, 37 (09): : 1212 - 1217
[5] Convolutional Neural Networks for Distant Speech Recognition
Swietojanski, Pawel
Ghoshal, Arnab
Renals, Steve
IEEE SIGNAL PROCESSING LETTERS, 2014, 21 (09) : 1120 - 1124
[6] AN ANALYSIS OF CONVOLUTIONAL NEURAL NETWORKS FOR SPEECH RECOGNITION
Huang, Jui-Ting
Li, Jinyu
Gong, Yifan
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4989 - 4993
[7] Speech Recognition Based on Convolutional Neural Networks
Du Guiming
Wang Xia
Wang Guangyan
Zhang Yan
Li Dan
2016 IEEE INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING (ICSIP), 2016, : 708 - 711
[8] Data Augmentation for EEG-Based Emotion Recognition with Deep Convolutional Neural Networks
Wang, Fang
Zhong, Sheng-hua
Peng, Jianfeng
Jiang, Jianmin
Liu, Yan
MULTIMEDIA MODELING, MMM 2018, PT II, 2018, 10705 : 82 - 93
[9] Deep Convolutional Neural Networks Based on Image Data Augmentation for Visual Object Recognition
Jayech, Khaoula
INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2019, PT I, 2019, 11871 : 476 - 485
[10] DATA AUGMENTATION WITH GABOR FILTER IN DEEP CONVOLUTIONAL NEURAL NETWORKS FOR SAR TARGET RECOGNITION
Jiang, Ting
Cui, Zongyong
Zhou, Zhi
Cao, Zongjie
IGARSS 2018 - 2018 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2018, : 689 - 692

← 1 2 3 4 5 →