Reverb and Noise as Real-World Effects in Speech Recognition Models: A Study and a Proposal of a Feature Set

被引:0
|
作者
Cesarini, Valerio [1 ]
Costantini, Giovanni [1 ]
机构
[1] Univ Roma Tor Vergata, Dept Elect Engn, I-00133 Rome, Italy
来源
APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 23期
关键词
speaker recognition; data augmentation; noise; reverb; MFCC; RASTA; speaker verification; SVM; SPEAKER VERIFICATION;
D O I
10.3390/app142311446
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Reverberation and background noise are common and unavoidable real-world phenomena that hinder automatic speaker recognition systems, particularly because these systems are typically trained on noise-free data. Most models rely on fixed audio feature sets. To evaluate the dependency of features on reverberation and noise, this study proposes augmenting the commonly used mel-frequency cepstral coefficients (MFCCs) with relative spectral (RASTA) features. The performance of these features was assessed using noisy data generated by applying reverberation and pink noise to the DEMoS dataset, which includes 56 speakers. Verification models were trained on clean data using MFCCs, RASTA features, or their combination as inputs. They validated on augmented data with progressively increasing noise and reverberation levels. The results indicate that MFCCs struggle to identify the main speaker, while the RASTA method has difficulty with the opposite class. The hybrid feature set, derived from their combination, demonstrates the best overall performance as a compromise between the two. Although the MFCC method is the standard and performs well on clean training data, it shows a significant tendency to misclassify the main speaker in real-world scenarios, which is a critical limitation for modern user-centric verification applications. The hybrid feature set, therefore, proves effective as a balanced solution, optimizing both sensitivity and specificity.
引用
收藏
页数:23
相关论文
共 50 条
  • [1] A real-world noise removal with wavelet speech feature
    Samba Raju Chiluveru
    Manoj Tripathy
    International Journal of Speech Technology, 2020, 23 : 683 - 693
  • [2] A real-world noise removal with wavelet speech feature
    Chiluveru, Samba Raju
    Tripathy, Manoj
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2020, 23 (03) : 683 - 693
  • [3] The cafeteria study: Effects of facial masks, hearing protection, and real-world noise on speech recognition
    Barrett, Mary E.
    Gordon-Salant, Sandra
    Brungart, Douglas S.
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2021, 150 (06): : 4244 - 4255
  • [4] Measurement of Speech in Noise Abilities in Laboratory and Real-World Noise
    Shukla, Bhanu
    Rao, B. Srinivasa
    Saxena, Udit
    Verma, Himanshu
    INDIAN JOURNAL OF OTOLOGY, 2018, 24 (02) : 109 - 113
  • [5] Speech Emotion Recognition Applied to Real-World Medical Consultation
    Huang, Ching-Tzu
    Huang, Chih-Wei
    Yang, Hsuan-Chia
    Li, Yu-Chuan
    MEDINFO 2023 - THE FUTURE IS ACCESSIBLE, 2024, 310 : 1121 - 1125
  • [6] Study on Speaker-Independent Emotion Recognition from Speech on Real-World Data
    Kostoulas, Theodoros
    Ganchev, Todor
    Fakotakis, Nikos
    VERBAL AND NONVERBAL FEATURES OF HUMAN-HUMAN AND HUMAN-MACHINE INTERACTIONS, 2008, 5042 : 235 - 242
  • [7] A MODULATION FEATURE SET FOR ROBUST AUTOMATIC SPEECH RECOGNITION IN ADDITIVE NOISE AND REVERBERATION
    Liu, Xiaoyu
    Sadeghian, Roozbeh
    Zahorian, Stephen A.
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5230 - 5234
  • [8] Effects of entropy in real-world noise on speech perception in listeners with normal hearing and hearing loss
    Jorgensen, Erik
    Wu, Yu-Hsiang
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2023, 154 (06): : 3627 - 3643
  • [9] Auditory processing of speech signals for robust speech recognition in real-world noisy environments
    Kim, DS
    Lee, SY
    Kil, RM
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1999, 7 (01): : 55 - 69
  • [10] HANDS-FREE SPEECH RECOGNITION CHALLENGE FOR REAL-WORLD SPEECH DIALOGUE SYSTEMS
    Saruwatari, Hiroshi
    Kawanami, Hiromichi
    Takeuchi, Shota
    Takahashi, Yu
    Cincarek, Tobias
    Shikano, Kiyohiro
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 3729 - 3732