The impact of MFCC, spectrogram, and Mel-Spectrogram on deep learning models for Amazigh speech recognition system

被引:0
|
作者
Meryam Telmem [1 ]
Naouar Laaidi [2 ]
Hassan Satori [2 ]
机构
[1] Université Moulay Ismail de Meknes,
[2] Sidi Mohamed Ben Abdellah University,undefined
关键词
MFCC; Spectrogram; Mel-Spectrogram; CNN; LSTM; bi-LSTM; Amazigh language;
D O I
10.1007/s10772-025-10183-3
中图分类号
学科分类号
摘要
Feature extraction is an essential phase in the development of Automatic Speech Recognition (ASR) systems. This study examines the performance of different deep neural network architectures, including Convolutional Neural Networks (CNNs), Long Short-Term Memory networks (LSTMs), and (bi-LSTM) models for the Amazigh speech recognition system. When applied a several of feature extraction techniques, specifically Mel-Frequency Cepstral Coefficients (MFCC), Spectrograms, and Mel-Spectrograms, on the performance of different. The results show that the Bi-LSTM with Spectrograms achieved a maximum accuracy of 85%, giving the best performance in our Amazigh Speech Recognition (ASR) study. and we show that each feature type offers specific advantages, influenced by the particular neural network architecture employed.
引用
收藏
页码:299 / 312
页数:13
相关论文
共 50 条
  • [21] Comparison of the Effects of Mel Coefficients and Spectrogram Images via Deep Learning in Emotion Classification
    Demircan, Semiye
    Ornek, Humar Kahramanli
    TRAITEMENT DU SIGNAL, 2020, 37 (01) : 51 - 57
  • [22] Prediction of Arteriovenous Access Dysfunction by Mel Spectrogram-based Deep Learning Model
    Chung, Tung-Ling
    Liu, Yi-Hsueh
    Wu, Pei-Yu
    Huang, Jiun-Chi
    Tsai, Yi-Chun
    Wang, Yu-Chen
    Pan, Shan-Pin
    Hsu, Ya-Ling
    Chen, Szu-Chia
    Chen, Szu-Chia
    INTERNATIONAL JOURNAL OF MEDICAL SCIENCES, 2024, 21 (12): : 2252 - 2260
  • [23] Amazigh Spoken Digit Recognition using a Deep Learning Approach based on MFCC
    Boulal, Hossam
    Hamidi, Mohamed
    Abarkan, Mustapha
    Barkani, Jamal
    INTERNATIONAL JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING SYSTEMS, 2023, 14 (07) : 791 - 798
  • [24] Mel-S3R: Combining Mel-spectrogram and self-supervised speech representation with VQ-VAE for any-to-any voice conversion
    Yang, Jichen
    Zhou, Yi
    Huang, Hao
    SPEECH COMMUNICATION, 2023, 151 : 52 - 63
  • [25] TranStutter: A Convolution-Free Transformer-Based Deep Learning Method to Classify Stuttered Speech Using 2D Mel-Spectrogram Visualization and Attention-Based Feature Representation
    Basak, Krishna
    Mishra, Nilamadhab
    Chang, Hsien-Tsung
    SENSORS, 2023, 23 (19)
  • [26] Heart Sound Classification Using Deep Learning Techniques Based on Log-mel Spectrogram
    Minh Tuan Nguyen
    Lin, Wei Wen
    Huang, Jin H.
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2023, 42 (01) : 344 - 360
  • [27] Speech enhancement using non-negative spectrogram models with mel-generalized cepstral regularization
    Li, Li
    Kameoka, Hirokazu
    Toda, Tomoki
    Makino, Shoji
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1998 - 2002
  • [28] Deep transfer learning-based bird species classification using mel spectrogram images
    Baowaly, Mrinal Kanti
    Sarkar, Bisnu Chandra
    Walid, Md. Abul Ala
    Ahamad, Md. Martuza
    Singh, Bikash Chandra
    Alvarado, Eduardo Silva
    Ashraf, Imran
    Samad, Md. Abdus
    PLOS ONE, 2024, 19 (08):
  • [29] Heart Sound Classification Using Deep Learning Techniques Based on Log-mel Spectrogram
    Minh Tuan Nguyen
    Wei Wen Lin
    Jin H. Huang
    Circuits, Systems, and Signal Processing, 2023, 42 : 344 - 360
  • [30] Speech emotion recognition based on optimized deep features of dual-channel complementary spectrogram
    Li, Juan
    Zhang, Xueying
    Li, Fenglian
    Huang, Lixia
    INFORMATION SCIENCES, 2023, 649