Reverb and Noise as Real-World Effects in Speech Recognition Models: A Study and a Proposal of a Feature Set

被引:0
|
作者
Cesarini, Valerio [1 ]
Costantini, Giovanni [1 ]
机构
[1] Univ Roma Tor Vergata, Dept Elect Engn, I-00133 Rome, Italy
来源
APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 23期
关键词
speaker recognition; data augmentation; noise; reverb; MFCC; RASTA; speaker verification; SVM; SPEAKER VERIFICATION;
D O I
10.3390/app142311446
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Reverberation and background noise are common and unavoidable real-world phenomena that hinder automatic speaker recognition systems, particularly because these systems are typically trained on noise-free data. Most models rely on fixed audio feature sets. To evaluate the dependency of features on reverberation and noise, this study proposes augmenting the commonly used mel-frequency cepstral coefficients (MFCCs) with relative spectral (RASTA) features. The performance of these features was assessed using noisy data generated by applying reverberation and pink noise to the DEMoS dataset, which includes 56 speakers. Verification models were trained on clean data using MFCCs, RASTA features, or their combination as inputs. They validated on augmented data with progressively increasing noise and reverberation levels. The results indicate that MFCCs struggle to identify the main speaker, while the RASTA method has difficulty with the opposite class. The hybrid feature set, derived from their combination, demonstrates the best overall performance as a compromise between the two. Although the MFCC method is the standard and performs well on clean training data, it shows a significant tendency to misclassify the main speaker in real-world scenarios, which is a critical limitation for modern user-centric verification applications. The hybrid feature set, therefore, proves effective as a balanced solution, optimizing both sensitivity and specificity.
引用
收藏
页数:23
相关论文
共 50 条
  • [11] Self-Supervised Speech Enhancement for Arabic Speech Recognition in Real-World Environments
    Dendani, Bilal
    Bahi, Halima
    Sari, Toufik
    TRAITEMENT DU SIGNAL, 2021, 38 (02) : 349 - 358
  • [12] Learning real-world heterogeneous noise models with a benchmark dataset
    Sun, Lu
    Lin, Jie
    Dong, Weisheng
    Li, Xin
    Wu, Jinjian
    Shi, Guangming
    PATTERN RECOGNITION, 2024, 156
  • [13] Empirical Evaluation of Feature Subset Selection Based on a Real-World Data Set
    Perner, Petra
    Apte, Chid
    LECTURE NOTES IN COMPUTER SCIENCE <D>, 2000, 1910 : 575 - 580
  • [14] Empirical evaluation of feature subset selection based on a real-world data set
    Perner, P
    Apte, C
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2004, 17 (03) : 285 - 288
  • [15] Domain Adapting Deep Reinforcement Learning for Real-World Speech Emotion Recognition
    Rajapakshe, Thejan
    Rana, Rajib
    Khalifa, Sara
    Schuller, Bjoern W.
    IEEE ACCESS, 2024, 12 : 193101 - 193114
  • [16] Microphone Array Processing for Distant Speech Recognition: Towards Real-World Deployment
    Kumatani, Kenichi
    Arakawa, Takayuki
    Yamamoto, Kazumasa
    McDonough, John
    Raj, Bhiksha
    Singh, Rita
    Tashev, Ivan
    2012 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2012,
  • [17] Enhancing Speech Emotion Recognition for Real-World Applications via ASR Integration
    Li, Yuanchao
    2023 11TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION WORKSHOPS AND DEMOS, ACIIW, 2023,
  • [18] Noise-robust cortical tracking of attended speech in real-world acoustic scenes
    Fuglsang, Soren Asp
    Dau, Torsten
    Hjortkjaer, Jens
    NEUROIMAGE, 2017, 156 : 435 - 444
  • [19] Is There a Mismatch between Real-World Feature Models and Product-Line Research?
    Knueppel, Alexander
    Thuem, Thomas
    Mennicke, Stephan
    Meinicke, Jens
    Schaefer, Ina
    ESEC/FSE 2017: PROCEEDINGS OF THE 2017 11TH JOINT MEETING ON FOUNDATIONS OF SOFTWARE ENGINEERING, 2017, : 291 - 302
  • [20] Are you sure? Analysing Uncertainty Quantification Approaches for Real-world Speech Emotion Recognition
    Schruefer, Oliver
    Milling, Manuel
    Burkhardt, Felix
    Eyben, Florian
    Schuller, Bjoern
    INTERSPEECH 2024, 2024, : 3210 - 3214