Speaker Identification Under Noisy Conditions Using Hybrid Deep Learning Model

被引:0
|
作者
Lambamo, Wondimu [1 ]
Srinivasagan, Ramasamy [1 ,2 ]
Jifara, Worku [1 ]
机构
[1] Adama Sci & Technol Univ, Adama 1888, Ethiopia
[2] King Faisal Univ, Al Hasa 31982, Saudi Arabia
关键词
Speaker Identification; Convolutional Neural Network; Cochleogram; Bidirectional Gated Recurrent Unit; Real-World Noises; FEATURES; MFCC; VERIFICATION;
D O I
10.1007/978-3-031-57624-9_9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speaker identification is a biometric mechanism that determines a person who is speaking from a set of known speakers. It has vital applications in areas like security, surveillance, forensic investigations, and others. The accuracy of speaker identification systems was good by using clean speech. However, the speaker identification system performance gets degraded under noisy and mismatched conditions. Recently, a network of hybrid convolutional neural networks (CNN) and enhanced recurrent neural network (RNN) variants have performed better in speech recognition, image classification, and other pattern recognition. Moreover, cochleogram features have shown better accuracy in speech and speaker recognition under noisy conditions. However, there is no attempt conducted in speaker recognition using hybrid CNN and enhanced RNN variants with the cochleogram input to enhance the models' accuracy in noisy environments. This study proposes a speaker identification for noisy conditions using a hybrid CNN and bidirectional gated recurrent unit (BiGRU) network on the cochleogram input. The models were evaluated by using the VoxCeleb1 speech dataset with real-world noise, white Gaussian noises (WGN), and without additive noise. Real-world noises andWGN were added to the dataset at the signal-to-noise ratio (SNR) of -5 dB up to 20 dB with 5 dB intervals. The proposed model attained an accuracy of 93.15%, 97.55%, and 98.60% on the dataset with real-world noises at SNR of -5 dB, 10 dB, and 20 dB, respectively. The proposed model shows approximately similar performance on both real-world noise andWGN at similar SNR levels. Using the dataset without additive noise the model achieved 98.85% accuracy. The evaluation accuracy and the comparison with the previous works indicate that our model has better accuracy.
引用
收藏
页码:154 / 175
页数:22
相关论文
共 50 条
  • [31] A successful hybrid deep learning model aiming at promoter identification
    Wang, Ying
    Peng, Qinke
    Mou, Xu
    Wang, Xinyuan
    Li, Haozhou
    Han, Tian
    Sun, Zhao
    Wang, Xiao
    [J]. BMC BIOINFORMATICS, 2022, 23 (SUPPL 1)
  • [32] A successful hybrid deep learning model aiming at promoter identification
    Ying Wang
    Qinke Peng
    Xu Mou
    Xinyuan Wang
    Haozhou Li
    Tian Han
    Zhao Sun
    Xiao Wang
    [J]. BMC Bioinformatics, 23
  • [33] Localization-Driven Speech Enhancement in Noisy Multi-Speaker Hospital Environments Using Deep Learning and Meta Learning
    Barhoush, Mahdi
    Hallawa, Ahmed
    Peine, Arne
    Martin, Lukas
    Schmeink, Anke
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 670 - 683
  • [34] Voiceprint Identification for Limited Dataset Using the Deep Migration Hybrid Model Based on Transfer Learning
    Sun, Cunwei
    Yang, Yuxin
    Wen, Chang
    Xie, Kai
    Wen, Fangqing
    [J]. SENSORS, 2018, 18 (07)
  • [35] Design of a Hybrid Bioinspired Deep Learning Model for Identification of Heart Diseases Using Clinical Parameters
    Kulkarni D.
    Soni R.
    [J]. SN Computer Science, 4 (5)
  • [36] Superior Auto-Identification of Trypanosome Parasites by Using a Hybrid Deep-Learning Model
    Kittichai, Veerayuth
    Kaewthamasorn, Morakot
    Thanee, Suchansa
    Sasisaowapak, Thanyathep
    Naing, Kaung Myat
    Jomtarak, Rangsan
    Tongloy, Teerawat
    Chuwongin, Santhad
    Boonsang, Siridech
    [J]. JOVE-JOURNAL OF VISUALIZED EXPERIMENTS, 2023, (200):
  • [37] A hybrid model for depression detection using deep learning
    Vandana
    Marriwala N.
    Chaudhary D.
    [J]. Measurement: Sensors, 2023, 25
  • [38] SPEAKER IDENTIFICATION MODEL BASED ON DEEP NURAL NETWOKS
    Ahmed, Saadaldeen Rashid
    Abbood, Zainab Ali
    Farhan, hameed Mutlag
    Yasen, Baraa Taha
    Ahmed, Mohammed Rashid
    Duru, Adil Deniz
    [J]. Iraqi Journal for Computer Science and Mathematics, 2022, 3 (01): : 108 - 113
  • [39] Deep Learning Based Multi-Channel Speaker Recognition in Noisy and Reverberant Environments
    Taherian, Hassan
    Wang, Zhong-Qiu
    Wane, DeLiang
    [J]. INTERSPEECH 2019, 2019, : 4070 - 4074
  • [40] EFFICIENT SPEAKER IDENTIFICATION USING DISTRIBUTIONAL SPEAKER MODEL CLUSTERING
    Apsingekar, Vijendra Raj
    De Leon, Phillip L.
    [J]. 2008 42ND ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS, VOLS 1-4, 2008, : 1260 - 1264