The impact of MFCC, spectrogram, and Mel-Spectrogram on deep learning models for Amazigh speech recognition system

被引:0
|
作者
Meryam Telmem [1 ]
Naouar Laaidi [2 ]
Hassan Satori [2 ]
机构
[1] Université Moulay Ismail de Meknes,
[2] Sidi Mohamed Ben Abdellah University,undefined
关键词
MFCC; Spectrogram; Mel-Spectrogram; CNN; LSTM; bi-LSTM; Amazigh language;
D O I
10.1007/s10772-025-10183-3
中图分类号
学科分类号
摘要
Feature extraction is an essential phase in the development of Automatic Speech Recognition (ASR) systems. This study examines the performance of different deep neural network architectures, including Convolutional Neural Networks (CNNs), Long Short-Term Memory networks (LSTMs), and (bi-LSTM) models for the Amazigh speech recognition system. When applied a several of feature extraction techniques, specifically Mel-Frequency Cepstral Coefficients (MFCC), Spectrograms, and Mel-Spectrograms, on the performance of different. The results show that the Bi-LSTM with Spectrograms achieved a maximum accuracy of 85%, giving the best performance in our Amazigh Speech Recognition (ASR) study. and we show that each feature type offers specific advantages, influenced by the particular neural network architecture employed.
引用
收藏
页码:299 / 312
页数:13
相关论文
共 50 条
  • [31] Investigating Modulation Spectrogram Features for Deep Neural Network-based Automatic Speech Recognition
    Baby, Deepak
    Van Hamme, Hugo
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2479 - 2483
  • [32] Health State Recognition of Bearing based on Time-Frequency Spectrogram and Deep Learning
    Li, Xuan
    Liu, Yao
    Fang, Lei
    Chang, Jiantao
    2024 6TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING, ICNLP 2024, 2024, : 503 - 508
  • [33] Acoustic emission detection of filament wound CFRP composite structure damage based on Mel spectrogram and deep learning
    Ren, Xia-ying
    Wang, Jie
    Liang, Yu-jiao
    Ma, Lian-hua
    Zhou, Wei
    THIN-WALLED STRUCTURES, 2024, 198
  • [34] Transfer learning through perturbation-based in-domain spectrogram augmentation for adult speech recognition
    Kadyan, Virender
    Bawa, Puneet
    Neural Computing and Applications, 2022, 34 (23): : 21015 - 21033
  • [35] Transfer learning through perturbation-based in-domain spectrogram augmentation for adult speech recognition
    Kadyan, Virender
    Bawa, Puneet
    NEURAL COMPUTING & APPLICATIONS, 2022, 34 (23): : 21015 - 21033
  • [36] Deepfake Audio Detection Using Spectrogram-based Feature and Ensemble of Deep Learning Models
    Lam Pham
    Phat Lam
    Truong Nguyen
    Huyen Nguyen
    Schindler, Alexander
    2024 IEEE 5TH INTERNATIONAL SYMPOSIUM ON THE INTERNET OF SOUNDS, IS2 2024, 2024, : 170 - 174
  • [37] ESERNet: Learning spectrogram structure relationship for effective speech emotion recognition with swin transformer in classroom discourse analysis
    Liu, Tingting
    Wang, Minghong
    Yang, Bing
    Liu, Hai
    Yi, Shaoxin
    NEUROCOMPUTING, 2025, 612
  • [38] Optimizing avian species recognition with MFCC features and deep learning models
    Raviteja Kamarajugadda
    Rahul Battula
    Chaitanya Reddy Borra
    Harsha Durga
    Venkat Bypilla
    Seelam Srinivasa Reddy
    Farzana Fathima Khan
    Shrimannaraya Bhavanam
    International Journal of Information Technology, 2024, 16 (7) : 4621 - 4626
  • [39] Comparative study of CNN, LSTM and hybrid CNN-LSTM model in amazigh speech recognition using spectrogram feature extraction and different gender and age dataset
    Telmem, Meryam
    Laaidi, Naouar
    Ghanou, Youssef
    Hamiane, Sanae
    Satori, Hassan
    International Journal of Speech Technology, 2024, 27 (04) : 1121 - 1133
  • [40] Deep learning based fault detection of automobile dry clutch system using spectrogram plots
    Sai, Aditya S.
    Sridharan, Naveen Venkatesh
    Chakrapani, Ganjikunta
    Sugumaran, Vaithiyanathan
    ENGINEERING RESEARCH EXPRESS, 2024, 6 (02):