Speaker Identification Under Noisy Conditions Using Hybrid Deep Learning Model

被引:0
|
作者
Lambamo, Wondimu [1 ]
Srinivasagan, Ramasamy [1 ,2 ]
Jifara, Worku [1 ]
机构
[1] Adama Sci & Technol Univ, Adama 1888, Ethiopia
[2] King Faisal Univ, Al Hasa 31982, Saudi Arabia
关键词
Speaker Identification; Convolutional Neural Network; Cochleogram; Bidirectional Gated Recurrent Unit; Real-World Noises; FEATURES; MFCC; VERIFICATION;
D O I
10.1007/978-3-031-57624-9_9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speaker identification is a biometric mechanism that determines a person who is speaking from a set of known speakers. It has vital applications in areas like security, surveillance, forensic investigations, and others. The accuracy of speaker identification systems was good by using clean speech. However, the speaker identification system performance gets degraded under noisy and mismatched conditions. Recently, a network of hybrid convolutional neural networks (CNN) and enhanced recurrent neural network (RNN) variants have performed better in speech recognition, image classification, and other pattern recognition. Moreover, cochleogram features have shown better accuracy in speech and speaker recognition under noisy conditions. However, there is no attempt conducted in speaker recognition using hybrid CNN and enhanced RNN variants with the cochleogram input to enhance the models' accuracy in noisy environments. This study proposes a speaker identification for noisy conditions using a hybrid CNN and bidirectional gated recurrent unit (BiGRU) network on the cochleogram input. The models were evaluated by using the VoxCeleb1 speech dataset with real-world noise, white Gaussian noises (WGN), and without additive noise. Real-world noises andWGN were added to the dataset at the signal-to-noise ratio (SNR) of -5 dB up to 20 dB with 5 dB intervals. The proposed model attained an accuracy of 93.15%, 97.55%, and 98.60% on the dataset with real-world noises at SNR of -5 dB, 10 dB, and 20 dB, respectively. The proposed model shows approximately similar performance on both real-world noise andWGN at similar SNR levels. Using the dataset without additive noise the model achieved 98.85% accuracy. The evaluation accuracy and the comparison with the previous works indicate that our model has better accuracy.
引用
收藏
页码:154 / 175
页数:22
相关论文
共 50 条
  • [1] Speaker Identification System Under Noisy Conditions
    Alam, Md Shariful
    Zilany, Muhammad S. A.
    [J]. 2019 5TH INTERNATIONAL CONFERENCE ON ADVANCES IN ELECTRICAL ENGINEERING (ICAEE), 2019, : 566 - 569
  • [2] An Hybrid Front-End for Robust Speaker Identification Under Noisy Conditions
    Tazi, El Bachir
    El Makhfi, Noureddine
    [J]. PROCEEDINGS OF THE 2017 INTELLIGENT SYSTEMS CONFERENCE (INTELLISYS), 2017, : 764 - 768
  • [3] An investigation on speaker vector-based speaker identification under noisy conditions
    Goto, Yuki
    Akatsu, Tatsuya
    Katoh, Masaharu
    Kosaka, Tetsuo
    Kohda, Masaki
    [J]. 2008 INTERNATIONAL CONFERENCE ON AUDIO, LANGUAGE AND IMAGE PROCESSING, VOLS 1 AND 2, PROCEEDINGS, 2008, : 1430 - 1435
  • [4] Robust Speaker Identification in Noisy and Reverberant Conditions
    Zhao, Xiaojia
    Wang, Yuxuan
    Wang, DeLiang
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (04) : 836 - 845
  • [5] ROBUST SPEAKER IDENTIFICATION IN NOISY AND REVERBERANT CONDITIONS
    Zhao, Xiaojia
    Wang, Yuxuan
    Wang, DeLiang
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [6] Speaker Identification in Noisy Conditions Using Short Sequences of Speech Frames
    Biagetti, Giorgio
    Crippa, Paolo
    Falaschetti, Laura
    Orcioni, Simone
    Turchetti, Claudio
    [J]. INTELLIGENT DECISION TECHNOLOGIES 2017, KES-IDT 2017, PT II, 2018, 73 : 43 - 52
  • [7] Text-dependent speaker verification under noisy conditions using parallel model combination
    Wong, LP
    Russell, M
    [J]. 2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING - VOL IV: SIGNAL PROCESSING FOR COMMUNICATIONS; VOL V: SIGNAL PROCESSING EDUCATION SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO & ELECTROACOUSTICS; VOL VI: SIGNAL PROCESSING THEORY & METHODS STUDENT FORUM, 2001, : 457 - 460
  • [8] Robust Speaker Identification Under Noisy Conditions Using Feature Compensation and Signal to Noise Ratio Estimation
    Frankle, Megan N.
    Ramachandran, Ravi P.
    [J]. 2016 IEEE 59TH INTERNATIONAL MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS (MWSCAS), 2016, : 133 - 136
  • [9] Speaker-Independent Microphone Identification in Noisy Conditions
    Giganti, Antonio
    Cuccovillo, Luca
    Bestagini, Paolo
    Aichroth, Patrick
    Tubaro, Stefano
    [J]. 2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022), 2022, : 1047 - 1051
  • [10] Effect of Nonlinear Compression Function on the Performance of the Speaker Identification System under Noisy Conditions
    Jawarkar, Naresh P.
    Holambe, Raghunath S.
    Basu, Tapan Kumar
    [J]. PERCEPTION AND MACHINE INTELLIGENCE, 2015, 2015, : 137 - 144