Speaker Identification Under Noisy Conditions Using Hybrid Deep Learning Model

被引:0
|
作者
Lambamo, Wondimu [1 ]
Srinivasagan, Ramasamy [1 ,2 ]
Jifara, Worku [1 ]
机构
[1] Adama Sci & Technol Univ, Adama 1888, Ethiopia
[2] King Faisal Univ, Al Hasa 31982, Saudi Arabia
关键词
Speaker Identification; Convolutional Neural Network; Cochleogram; Bidirectional Gated Recurrent Unit; Real-World Noises; FEATURES; MFCC; VERIFICATION;
D O I
10.1007/978-3-031-57624-9_9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speaker identification is a biometric mechanism that determines a person who is speaking from a set of known speakers. It has vital applications in areas like security, surveillance, forensic investigations, and others. The accuracy of speaker identification systems was good by using clean speech. However, the speaker identification system performance gets degraded under noisy and mismatched conditions. Recently, a network of hybrid convolutional neural networks (CNN) and enhanced recurrent neural network (RNN) variants have performed better in speech recognition, image classification, and other pattern recognition. Moreover, cochleogram features have shown better accuracy in speech and speaker recognition under noisy conditions. However, there is no attempt conducted in speaker recognition using hybrid CNN and enhanced RNN variants with the cochleogram input to enhance the models' accuracy in noisy environments. This study proposes a speaker identification for noisy conditions using a hybrid CNN and bidirectional gated recurrent unit (BiGRU) network on the cochleogram input. The models were evaluated by using the VoxCeleb1 speech dataset with real-world noise, white Gaussian noises (WGN), and without additive noise. Real-world noises andWGN were added to the dataset at the signal-to-noise ratio (SNR) of -5 dB up to 20 dB with 5 dB intervals. The proposed model attained an accuracy of 93.15%, 97.55%, and 98.60% on the dataset with real-world noises at SNR of -5 dB, 10 dB, and 20 dB, respectively. The proposed model shows approximately similar performance on both real-world noise andWGN at similar SNR levels. Using the dataset without additive noise the model achieved 98.85% accuracy. The evaluation accuracy and the comparison with the previous works indicate that our model has better accuracy.
引用
收藏
页码:154 / 175
页数:22
相关论文
共 50 条
  • [41] Detection and identification of drugs under real conditions by using noisy terahertz broadband pulse
    Trofimov, Vyacheslav A.
    Varentsova, Svetlana A.
    [J]. APPLIED OPTICS, 2016, 55 (33) : 9605 - 9618
  • [42] Speaker identification based on deep learning in FX iDeal system
    Yan Ke
    Li Na
    Chen Yutinge
    [J]. 2018 INTERNATIONAL SYMPOSIUM ON POWER ELECTRONICS AND CONTROL ENGINEERING (ISPECE 2018), 2019, 1187
  • [43] Speaker Identification in Noisy Environment with Use of the Precise Model of the Human Auditory System
    Azetsu, Tadahiro
    Abuku, Masahiro
    Suetake, Noriaki
    Uchino, Eiji
    [J]. INTERNATIONAL MULTICONFERENCE OF ENGINEERS AND COMPUTER SCIENTISTS, IMECS 2012, VOL I, 2012, : 92 - 95
  • [44] Consolidating Product Spectrum and Gammatone Filterbank for Robust Speaker Verification under noisy conditions
    Fedila, Meriem
    Bengherabi, Messaoud
    Amrouche, Abderrahmane
    [J]. 2015 15TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS (ISDA), 2015, : 347 - 352
  • [45] CASA-based speaker identification using cascaded GMM-CNN classifier in noisy and emotional talking conditions
    Nassif, Ali Bou
    Shahin, Ismail
    Hamsa, Shibani
    Nemmour, Nawel
    Hirose, Keikichi
    [J]. APPLIED SOFT COMPUTING, 2021, 103 (103)
  • [46] Deep Learning and Machine Learning Techniques Applied to Speaker Identification on Small Datasets
    Manfron, Enrico
    Teixeira, Joao Paulo
    Minetto, Rodrigo
    [J]. OPTIMIZATION, LEARNING ALGORITHMS AND APPLICATIONS, PT II, OL2A 2023, 2024, 1982 : 195 - 210
  • [47] Open-Set Speaker Identification under Mismatch Conditions
    Pillay, S. G.
    Ariyaeeinia, A.
    Sivakumaran, P.
    Pawlewski, M.
    [J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2303 - +
  • [48] DeepMoD: Deep learning for model discovery in noisy data
    Both, Gert-Jan
    Choudhury, Subham
    Sens, Pierre
    Kusters, Remy
    [J]. JOURNAL OF COMPUTATIONAL PHYSICS, 2021, 428
  • [49] Speaker identification in noisy environment using bispectrum analysis and probabilistic neural network
    Kusumoputro, B
    Triyanto, A
    Fanany, MI
    Jatmiko, W
    [J]. ICCIMA 2001: FOURTH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND MULTIMEDIA APPLICATIONS, PROCEEDINGS, 2001, : 282 - 287
  • [50] Acoustic model enhancement: An adaptation technique for speaker verification under noisy environments
    Moreno-Daniel, A.
    Nolazco-Flores, J. A.
    Wada, T.
    Juang, B. -H.
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 289 - +