Exploring data augmentation for Amazigh speech recognition with convolutional neural networks

被引:0
|
作者
Hossam Boulal [1 ]
Farida Bouroumane [1 ]
Mohamed Hamidi [2 ]
Jamal Barkani [1 ]
Mustapha Abarkan [1 ]
机构
[1] FP Taza,LSI Laboratory
[2] USMBA University,Team of Modeling and Scientific Computing
[3] FPN,undefined
[4] UMP,undefined
关键词
Speech recognition; Data augmentation; Deep learning; Feature extraction; Amazigh digits;
D O I
10.1007/s10772-024-10164-y
中图分类号
学科分类号
摘要
In the field of speech recognition, enhancing accuracy is paramount for diverse linguistic communities. Our study addresses this necessity, focusing on improving Amazigh speech recognition through the implementation of three distinct data augmentation methods: Audio Augmentation, FilterBank Augmentation, and SpecAugment. Leveraging Convolutional Neural Networks (CNNs) for speech recognition, we utilize Mel Spectrograms extracted from audio files. The study specifically targets the recognition of the initial ten Amazigh digits. We conducted experiments with a speaker-independent approach involving 42 participants. A total of 27 experiments were conducted, utilizing both original and augmented data. Among the different CNN models employed, the VGG19 model showcased significant promise. Our results demonstrate a maximum accuracy of 95.66%. Furthermore, the most notable improvement achieved through data augmentation was 4.67%. These findings signify a substantial enhancement in speech recognition accuracy, indicating the efficacy of the proposed methods.
引用
收藏
页码:53 / 65
页数:12
相关论文
共 50 条
  • [21] VERY DEEP CONVOLUTIONAL NEURAL NETWORKS FOR ROBUST SPEECH RECOGNITION
    Qian, Yanmin
    Woodland, Philip C.
    2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 481 - 488
  • [22] Gender Differentiated Convolutional Neural Networks for Speech Emotion Recognition
    Mishra, Puneet
    Sharma, Ruchir
    2020 12TH INTERNATIONAL CONGRESS ON ULTRA MODERN TELECOMMUNICATIONS AND CONTROL SYSTEMS AND WORKSHOPS (ICUMT 2020), 2020, : 142 - 148
  • [23] FSER: Deep Convolutional Neural Networks for Speech Emotion Recognition
    Dossou, Bonaventure F. P.
    Gbenou, Yeno K. S.
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 3526 - 3531
  • [24] Speech Command Recognition Based on Convolutional Spiking Neural Networks
    Sadovsky, Erik
    Jakubec, Maros
    Jarina, Roman
    2023 33RD INTERNATIONAL CONFERENCE RADIOELEKTRONIKA, RADIOELEKTRONIKA, 2023,
  • [25] Neuromorphic Speech Recognition With Photonic Convolutional Spiking Neural Networks
    Xiang, Shuiying
    Zhang, Tianrui
    Han, Yanan
    Guo, Xingxing
    Zhang, Yahui
    Shi, Yuechun
    Hao, Yue
    IEEE JOURNAL OF SELECTED TOPICS IN QUANTUM ELECTRONICS, 2023, 29 (06)
  • [26] SPEECH EMOTION RECOGNITION USING QUATERNION CONVOLUTIONAL NEURAL NETWORKS
    Muppidi, Aneesh
    Radfar, Martin
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6309 - 6313
  • [27] Efficient GPU implementation of convolutional neural networks for speech recognition
    van den Berg, Ewout
    Brand, Daniel
    Bordawekar, Rajesh
    Rachevsky, Leonid
    Ramabhadran, Bhuvana
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1483 - 1487
  • [28] Speech Recognition of Punjabi Numerals Using Convolutional Neural Networks
    Aditi, Thakur
    Karun, Verma
    ADVANCES IN COMPUTER COMMUNICATION AND COMPUTATIONAL SCIENCES, VOL 1, 2019, 759 : 61 - 69
  • [29] Speech Emotion Recognition using Convolutional and Recurrent Neural Networks
    Lim, Wootaek
    Jang, Daeyoung
    Lee, Taejin
    2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2016,
  • [30] Image Augmentation-Based Food Recognition with Convolutional Neural Networks
    Pan, Lili
    Qin, Jiaohua
    Chen, Hao
    Xiang, Xuyu
    Li, Cong
    Chen, Ran
    CMC-COMPUTERS MATERIALS & CONTINUA, 2019, 59 (01): : 297 - 313