Reverb and Noise as Real-World Effects in Speech Recognition Models: A Study and a Proposal of a Feature Set

被引:0
|
作者
Cesarini, Valerio [1 ]
Costantini, Giovanni [1 ]
机构
[1] Univ Roma Tor Vergata, Dept Elect Engn, I-00133 Rome, Italy
来源
APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 23期
关键词
speaker recognition; data augmentation; noise; reverb; MFCC; RASTA; speaker verification; SVM; SPEAKER VERIFICATION;
D O I
10.3390/app142311446
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Reverberation and background noise are common and unavoidable real-world phenomena that hinder automatic speaker recognition systems, particularly because these systems are typically trained on noise-free data. Most models rely on fixed audio feature sets. To evaluate the dependency of features on reverberation and noise, this study proposes augmenting the commonly used mel-frequency cepstral coefficients (MFCCs) with relative spectral (RASTA) features. The performance of these features was assessed using noisy data generated by applying reverberation and pink noise to the DEMoS dataset, which includes 56 speakers. Verification models were trained on clean data using MFCCs, RASTA features, or their combination as inputs. They validated on augmented data with progressively increasing noise and reverberation levels. The results indicate that MFCCs struggle to identify the main speaker, while the RASTA method has difficulty with the opposite class. The hybrid feature set, derived from their combination, demonstrates the best overall performance as a compromise between the two. Although the MFCC method is the standard and performs well on clean training data, it shows a significant tendency to misclassify the main speaker in real-world scenarios, which is a critical limitation for modern user-centric verification applications. The hybrid feature set, therefore, proves effective as a balanced solution, optimizing both sensitivity and specificity.
引用
收藏
页数:23
相关论文
共 50 条
  • [21] Don't Generate, Discriminate: A Proposal for Grounding Language Models to Real-World Environments
    Gu, Yu
    Deng, Xiang
    Su, Yu
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 4928 - 4949
  • [22] Adversarial Mask: Real-World Universal Adversarial Attack on Face Recognition Models
    Zolfi, Alon
    Avidan, Shai
    Elovici, Yuval
    Shabtai, Asaf
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2022, PT III, 2023, 13715 : 304 - 320
  • [23] PERFORMANCE RESULTS OF THE SIMPLEX ALGORITHM FOR A SET OF REAL-WORLD LINEAR-PROGRAMMING MODELS
    MCCALL, EH
    COMMUNICATIONS OF THE ACM, 1982, 25 (03) : 207 - 213
  • [24] Voice Command II: A DSP implementation of robust speech recognition in real-world noisy environments
    Lee, SY
    Kim, DS
    Ahn, KH
    Jeong, JH
    Kim, H
    Park, SY
    Kim, LY
    Lee, JS
    Lee, HY
    PROGRESS IN CONNECTIONIST-BASED INFORMATION SYSTEMS, VOLS 1 AND 2, 1998, : 1051 - 1054
  • [25] Robust coherence-based spectral enhancement for speech recognition in adverse real-world environments
    Barfuss, Hendrik
    Huemmer, Christian
    Schwarz, Andreas
    Kellermann, Walter
    COMPUTER SPEECH AND LANGUAGE, 2017, 46 : 388 - 400
  • [26] Application of Transfer Learning-Based English Speech Emotion Recognition in Real-World Scenarios
    Zhang, Ping
    PROCEEDINGS OF 2024 INTERNATIONAL CONFERENCE ON MACHINE INTELLIGENCE AND DIGITAL APPLICATIONS, MIDA2024, 2024, : 224 - 229
  • [27] A Study and Experimental Results for Sound Recognition in Real-world Robot Interaction
    Lee, Sang-Rae
    Yoon, Ho-Sub
    Hahn, Moon-Sung
    Chung, Myung-Ae
    2012 9TH INTERNATIONAL CONFERENCE ON UBIQUITOUS ROBOTS AND AMBIENT INTELLIGENCE (URAL), 2012, : 26 - 29
  • [28] Effects of automatic auditory scene classification on speech perception in noise and real-world functional communication in children using cochlear implants
    Ching, Teresa Y. C.
    Zhang, Vicky
    Nel, Esti
    Hou, Sanna
    Incerti, Paola
    Plasmans, Anke
    COCHLEAR IMPLANTS INTERNATIONAL, 2024,
  • [29] Application of psychophysical models for audibility prediction of technical signals in real-world background noise
    Schell-Majoor, Lena
    Rennies, Jan
    Ewert, Stephan D.
    Kollmeier, Birger
    APPLIED ACOUSTICS, 2015, 88 : 44 - 51
  • [30] IMPACT OF THE CHOICE OF CLINICAL CODE SET ON THE OUTCOMES OF A REAL-WORLD DATABASE STUDY
    Artignan, A.
    Buchanan-Hughes, A.
    VALUE IN HEALTH, 2020, 23 : S324 - S324