The impact of MFCC, spectrogram, and Mel-Spectrogram on deep learning models for Amazigh speech recognition system

被引:0
|
作者
Meryam Telmem [1 ]
Naouar Laaidi [2 ]
Hassan Satori [2 ]
机构
[1] Université Moulay Ismail de Meknes,
[2] Sidi Mohamed Ben Abdellah University,undefined
关键词
MFCC; Spectrogram; Mel-Spectrogram; CNN; LSTM; bi-LSTM; Amazigh language;
D O I
10.1007/s10772-025-10183-3
中图分类号
学科分类号
摘要
Feature extraction is an essential phase in the development of Automatic Speech Recognition (ASR) systems. This study examines the performance of different deep neural network architectures, including Convolutional Neural Networks (CNNs), Long Short-Term Memory networks (LSTMs), and (bi-LSTM) models for the Amazigh speech recognition system. When applied a several of feature extraction techniques, specifically Mel-Frequency Cepstral Coefficients (MFCC), Spectrograms, and Mel-Spectrograms, on the performance of different. The results show that the Bi-LSTM with Spectrograms achieved a maximum accuracy of 85%, giving the best performance in our Amazigh Speech Recognition (ASR) study. and we show that each feature type offers specific advantages, influenced by the particular neural network architecture employed.
引用
收藏
页码:299 / 312
页数:13
相关论文
共 50 条
  • [41] Deep transfer learning architecture for suspension system fault diagnosis using spectrogram image and CNN
    Balaji, Parameshwaran Arun
    Venkatesh, Sridharan Naveen
    Sugumaran, Vaithiyanathan
    Mahamuni, Vetri Selvi
    ADVANCES IN MECHANICAL ENGINEERING, 2024, 16 (06)
  • [42] Deep transfer learning for automated liver cancer gene recognition using spectrogram images of digitized DNA sequences
    Das, Bihter
    Toraman, Suat
    Biomedical Signal Processing and Control, 2022, 72
  • [43] Deep transfer learning for automated liver cancer gene recognition using spectrogram images of digitized DNA sequences
    Das, Bihter
    Toraman, Suat
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2022, 72
  • [44] Retraction Note: Transfer learning through perturbation-based in-domain spectrogram augmentation for adult speech recognition
    Virender Kadyan
    Puneet Bawa
    Neural Computing and Applications, 2024, 36 (24) : 15235 - 15235
  • [45] ASERNet: Automatic speech emotion recognition system using MFCC-based LPC approach with deep learning CNN
    Jagadeeshwar, Kalyanapu
    Sreenivasarao, T.
    Pulicherla, Padmaja
    Satyanarayana, K. N. V.
    Lakshmi, K. Mohana
    Kumar, Pala Mahesh
    INTERNATIONAL JOURNAL OF MODELING SIMULATION AND SCIENTIFIC COMPUTING, 2023, 14 (04)
  • [46] Embedded deep learning models for multilingual speech recognition
    Rahmouni, Mohamed Hedi
    Salhi, Mohamed Salah
    Touti, Ezzeddine
    Allagui, Hatem
    Aoudia, Mouloud
    Barr, Mohammad
    COMPUTERS & ELECTRICAL ENGINEERING, 2025, 123
  • [47] DEEP VARIATIONAL FILTER LEARNING MODELS FOR SPEECH RECOGNITION
    Agrawal, Purvi
    Ganapathy, Sriram
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5731 - 5735
  • [48] Deep Learning Analysis Models for Speech and Emotional Recognition
    Wu, Jun
    Zhu, Tianliang
    Yu, Chengtian
    Wang, Chunzhi
    Zhou, Xianjing
    Liu, Hu
    2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 1541 - 1545
  • [49] A New Model to Detect COVID-19 Coughing and Breathing Sound Symptoms Classification from CQT and Mel Spectrogram Image Representation using Deep Learning
    Aly, Mohammed
    Alotaibi, Nouf Saeed
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (08) : 601 - 611
  • [50] Accuracy Enhancement Method for Speech Emotion Recognition From Spectrogram Using Temporal Frequency Correlation and Positional Information Learning Through Knowledge Transfer
    Kim, Jeong-Yoon
    Lee, Seung-Ho
    IEEE ACCESS, 2024, 12 : 128039 - 128048