Speech Translation From Darija to Modern Standard Arabic: Fine-Tuning Whisper on the Darija-C Corpus Across Six Model Versions

被引:0
|
作者
Labied, Maria [1 ]
Belangour, Abdessamad [1 ]
Banane, Mouad [2 ]
机构
[1] Hassan II Univ Casablanca, Lab Informat Technol & Modeling LTIM, Fac Sci Ben MSik, Casablanca 20360, Morocco
[2] Hassan II Univ, Fac Legal Econ & Social Sci, Lab Artificial Intelligence & Complex Syst Engn, Casablanca 20360, Morocco
来源
IEEE ACCESS | 2025年 / 13卷
关键词
Translation; Computational modeling; Standards; Adaptation models; Acoustics; Speech processing; Decoding; Accuracy; Transformers; Modeling; Speech translation; speech-to-text whisper model; fine-tuning; low-resource dialects; Moroccan Arabic; Darija; modern standard Arabic; whisper-large; whisper-medium; whisper-small;
D O I
10.1109/ACCESS.2025.3551229
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This study explores the fine-tuning of six versions of the Whisper model for speech translation from Moroccan Darija to Modern Standard Arabic (MSA). We focus on fine-tuning Whisper across six different versions: small, medium, large, large-v2, large-v3, and turbo. Our primary goal is to evaluate how these model variants perform in translating Darija speech into accurate and coherent MSA text, shedding light on the trade-offs between model capacity and translation quality. The experiments are conducted on the Darija-C Corpus, a specialized dataset designed to capture the linguistic nuances of Darija and its relationship with MSA. We analyze factors such as computational efficiency, memory usage, and training time to offer a clear view of model deployment in resource-constrained environments. This study provides valuable insights for developing robust Darija-to-MSA speech translation systems and highlights the broader potential of fine-tuning Whisper for low-resource language pairs.
引用
收藏
页码:48656 / 48671
页数:16
相关论文
empty
未找到相关数据