Wavelet Scattering Transform and CNN for Closed Set Speaker Identification

被引:14
|
作者
Ghezaiel, Wajdi [1 ]
Brun, Luc [2 ]
Lezoray, Olivier [2 ]
机构
[1] Normandie Univ, ENSICAEN, UNICAEN, CNRS,NormaSTIC, F-14000 Caen, France
[2] Normandie Univ, UNICAEN, ENSICAEN, CNRS,Greyc,UMR 6072, F-14000 Caen, France
关键词
Speaker identification; short utterances; wavelet scattering transform; convolutional neural network; hybrid network; VERIFICATION;
D O I
10.1109/mmsp48831.2020.9287061
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
In real world applications, the performances of speaker identification systems degrade due to the reduction of both the amount and the quality of speech utterance. For that particular purpose, we propose a speaker identification system where short utterances with few training examples are used for person identification. Therefore, only a very small amount of data involving a sentence of 2-4 seconds is used. To achieve this, we propose a novel raw waveform end-to-end convolutional neural network (CNN) for text-independent speaker identification. We use wavelet scattering transform as a fixed initialization of the first layers of a CNN network, and learn the remaining layers in a supervised manner. The conducted experiments show that our hybrid architecture combining wavelet scattering transform and CNN can successfully perform efficient feature extraction for a speaker identification, even with a small number of short duration training samples.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] Wavelet Scattering Transform Depth Benefit, An Application for Speaker Identification
    Moufidi, Abderrazzaq
    Rousseau, David
    Rasti, Pejman
    ARTIFICIAL NEURAL NETWORKS IN PATTERN RECOGNITION, ANNPR 2022, 2023, 13739 : 97 - 106
  • [2] Closed-Set Device-Independent Speaker Identification Using CNN
    Chakraborty, Tapas
    Barai, Bidhan
    Chatterjee, Bikshan
    Das, Nibaran
    Basu, Subhadip
    Nasipuri, Mita
    INTELLIGENT COMPUTING AND COMMUNICATION, ICICC 2019, 2020, 1034 : 291 - 299
  • [3] Speaker Identification Wavelet Transform Based Method
    Daqrouq, Khaled
    Al-Sawalmeh, Wael
    Al-Qawasmi, Abdel-Rahman
    Abu-Isbeib, Ibrahim N.
    2008 5TH INTERNATIONAL MULTI-CONFERENCE ON SYSTEMS, SIGNALS AND DEVICES, VOLS 1 AND 2, 2008, : 698 - 702
  • [4] A robust speaker identification system based on wavelet transform
    Hsieh, CT
    Wang, YC
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2001, E84D (07): : 839 - 846
  • [5] A study on improving decisions in closed set speaker identification
    Demirekler, M
    Saranli, A
    1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 1127 - 1130
  • [6] Closed-set Speaker Identification in Speech Gateways
    Neiva, J.
    Guimaraes, A.
    Macedo, H.
    IEEE LATIN AMERICA TRANSACTIONS, 2014, 12 (06) : 1127 - 1133
  • [7] Speaker Identification System Using Wavelet Transform and Neural Network
    Daqrouq, K.
    Abu Hilal, T.
    Sherif, M.
    El-Hajar, S.
    Al-Qawasmi, A.
    2009 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTATIONAL TOOLS FOR ENGINEERING APPLICATIONS, 2009, : 560 - +
  • [8] Comparison of frequency bands in closed set speaker identification performance
    Orman, ÖD
    Arslan, L
    TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2000, 1902 : 314 - 318
  • [9] Robust speech features based on wavelet transform with application to speaker identification
    Hsieh, CT
    Lai, E
    Wang, YC
    IEE PROCEEDINGS-VISION IMAGE AND SIGNAL PROCESSING, 2002, 149 (02): : 108 - 114
  • [10] Improving Speaker Identification System Using Discrete Wavelet Transform and AWGN
    Maged, Heba
    AbouEl-Farag, Ahmed
    Mesbah, Saleh
    2014 5TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND SERVICE SCIENCE (ICSESS), 2014, : 1171 - 1176