Reverb and Noise as Real-World Effects in Speech Recognition Models: A Study and a Proposal of a Feature Set

被引:0
|
作者
Cesarini, Valerio [1 ]
Costantini, Giovanni [1 ]
机构
[1] Univ Roma Tor Vergata, Dept Elect Engn, I-00133 Rome, Italy
来源
APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 23期
关键词
speaker recognition; data augmentation; noise; reverb; MFCC; RASTA; speaker verification; SVM; SPEAKER VERIFICATION;
D O I
10.3390/app142311446
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Reverberation and background noise are common and unavoidable real-world phenomena that hinder automatic speaker recognition systems, particularly because these systems are typically trained on noise-free data. Most models rely on fixed audio feature sets. To evaluate the dependency of features on reverberation and noise, this study proposes augmenting the commonly used mel-frequency cepstral coefficients (MFCCs) with relative spectral (RASTA) features. The performance of these features was assessed using noisy data generated by applying reverberation and pink noise to the DEMoS dataset, which includes 56 speakers. Verification models were trained on clean data using MFCCs, RASTA features, or their combination as inputs. They validated on augmented data with progressively increasing noise and reverberation levels. The results indicate that MFCCs struggle to identify the main speaker, while the RASTA method has difficulty with the opposite class. The hybrid feature set, derived from their combination, demonstrates the best overall performance as a compromise between the two. Although the MFCC method is the standard and performs well on clean training data, it shows a significant tendency to misclassify the main speaker in real-world scenarios, which is a critical limitation for modern user-centric verification applications. The hybrid feature set, therefore, proves effective as a balanced solution, optimizing both sensitivity and specificity.
引用
收藏
页数:23
相关论文
共 50 条
  • [31] Impact of the choice of clinical code set on the outcomes of a real-world database study
    Artignan, Audrey
    Buchanan-Hughes, Amy
    PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2020, 29 : 416 - 417
  • [32] An Experimental Study on Structural-MAP Approaches to Implementing Very Large Vocabulary Speech Recognition Systems for Real-World Tasks
    Chen, I-Fan
    Siniscalchi, Sabato Marco
    Moon, Seokyong
    Shin, Daejin
    Koo, Myong-Wan
    Chung, Minhwa
    Lee, Chin-Hui
    2013 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2013,
  • [33] Evidence-Based Occupational Hearing Screening I: Modeling the Effects of Real-World Noise Environments on the Likelihood of Effective Speech Communication
    Soli, Sigfrid D.
    Giguere, Christian
    Laroche, Chantal
    Vaillancourt, Veronique
    Dreschler, Wouter A.
    Rhebergen, Koenraad S.
    Harkins, Kevin
    Ruckstuhl, Mark
    Ramulu, Pradeep
    Meyers, Lawrence S.
    EAR AND HEARING, 2018, 39 (03): : 436 - 448
  • [34] Auditory pathway model and its VLSI implementation for robust speech recognition in real-world noisy environment
    Lee, SY
    Kim, CM
    Won, YG
    Park, HM
    PROCEEDINGS OF 2003 INTERNATIONAL CONFERENCE ON NEURAL NETWORKS & SIGNAL PROCESSING, PROCEEDINGS, VOLS 1 AND 2, 2003, : 1728 - 1733
  • [35] Real-World and Rapid Face Recognition Toward Pose and Expression Variations via Feature Library Matrix
    Moeini, Ali
    Moeini, Hossein
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2015, 10 (05) : 969 - 984
  • [36] Open-set face recognition across look-alike faces in real-world scenarios
    Moeini, Ali
    Faez, Karim
    Moeini, Hossein
    Safai, Armon Matthew
    IMAGE AND VISION COMPUTING, 2017, 57 : 1 - 14
  • [37] Implementation of a real-world based ICF set for the rehabilitation of respiratory diseases: a pilot study
    Vitacca, Michele
    Giardini, Anna
    Corica, Giacomo
    Ceriana, Piero
    Carone, Mauro
    Balbi, Bruno
    Fracchia, Claudio
    Maniscalco, Mauro
    Fanfulla, Francesco
    Sarno, Nicola
    Raccanelli, Rita
    Traversoni, Silvia
    Spanevello, Antonio
    MINERVA MEDICA, 2020, 111 (03) : 239 - 244
  • [38] Towards robust paralinguistic assessment for real-world mobile health (mHealth) monitoring: an initial study of reverberation effects on speech
    Dineley, Judith
    Carr, Ewan
    Matcham, Faith
    Downs, Johnny
    Dobson, Richard
    Quatieri, Thomas F.
    Cummins, Nicholas
    INTERSPEECH 2023, 2023, : 2373 - 2377
  • [39] A Reusable Set of Real-World Product Line Case Studies for Comparing Variability Models in Research and Practice
    Meixner, Kristof
    Feichtinger, Kevin
    Rabiser, Rick
    Biffl, Stefan
    SPLC '21 - PROCEEDINGS OF THE 25TH ACM INTERNATIONAL SYSTEMS AND SOFTWARE PRODUCT LINE CONFERENCE, VOL B, 2021, : 105 - 112
  • [40] Fast offline transformer-based end-to-end automatic speech recognition for real-world applications
    Oh, Yoo Rhee
    Park, Kiyoung
    Park, Jeon Gue
    ETRI JOURNAL, 2022, 44 (03) : 476 - 490