Convolutional neural network vectors for speaker recognition

被引:0
|
作者
Soufiane Hourri
Nikola S. Nikolov
Jamal Kharroubi
机构
[1] Laboratoire des Systèmes Intelligents et Applications,
[2] Faculté des Sciences et Techniques,undefined
[3] Université Sidi Mohamed Ben Abdellah,undefined
[4] University of Limerick,undefined
关键词
Speaker recognition; MFCC; Convolutional neural network; Restricted Boltzmann machine; Deep learning;
D O I
暂无
中图分类号
学科分类号
摘要
Deep learning models are now considered state-of-the-art in many areas of pattern recognition. In speaker recognition, several architectures have been studied, such as deep neural networks (DNNs), deep belief networks (DBNs), restricted Boltzmann machines (RBMs), and so on, while convolutional neural networks (CNNs) are the most widely used models in computer vision. The problem is that CNN is limited to the computer vision field due to its structure which is designed for two-dimensional data. To overcome this limitation, we aim at developing a customized CNN for speaker recognition. The goal of this paper is to propose a new approach to extract speaker characteristics by constructing CNN filters linked to the speaker. Besides, we propose new vectors to identify speakers, which we call in this work convVectors. Experiments have been performed with a gender-dependent corpus (THUYG-20 SRE) under three noise conditions : clean, 9db, and 0db. We compared the proposed method with our baseline system and the state-of-the-art methods. Results showed that the convVectors method was the most robust, improving the baseline system by an average of 43%, and recording an equal error rate of 1.05% EER. This is an important finding to understand how deep learning models can be adapted to the problem of speaker recognition.
引用
收藏
页码:389 / 400
页数:11
相关论文
共 50 条
  • [21] Network Protocol Recognition Based on Convolutional Neural Network
    Wenbo Feng
    Zheng Hong
    Lifa Wu
    Menglin Fu
    Yihao Li
    Peihong Lin
    China Communications, 2020, 17 (04) : 125 - 139
  • [22] JOINT SPEAKER DIARIZATION AND RECOGNITION USING CONVOLUTIONAL AND RECURRENT NEURAL NETWORKS
    Zhou, Zhihan
    Zhang, Yichi
    Duan, Zhiyao
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 2496 - 2500
  • [23] Speaker Recognition Using Constrained Convolutional Neural Networks in Emotional Speech
    Simic, Nikola
    Suzic, Sinisa
    Nosek, Tijana
    Vujovic, Mia
    Peric, Zoran
    Savic, Milan
    Delic, Vlado
    ENTROPY, 2022, 24 (03)
  • [24] A deep learning approach to integrate convolutional neural networks in speaker recognition
    Hourri, Soufiane
    Nikolov, Nikola S.
    Kharroubi, Jamal
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2020, 23 (03) : 615 - 623
  • [25] A deep learning approach to integrate convolutional neural networks in speaker recognition
    Soufiane Hourri
    Nikola S. Nikolov
    Jamal Kharroubi
    International Journal of Speech Technology, 2020, 23 : 615 - 623
  • [26] Speaker recognition with a self-configuring neural network
    Lei, J
    Hall, LO
    1997 IEEE INTERNATIONAL CONFERENCE ON NEURAL NETWORKS, VOLS 1-4, 1997, : 2351 - 2354
  • [27] A BAYESIAN ATTENTION NEURAL NETWORK LAYER FOR SPEAKER RECOGNITION
    Zhu, Weizhong
    Pelecanos, Jason
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6241 - 6245
  • [28] Optimization of Multilayer Neural Network Parameters for Speaker Recognition
    Tovarek, Jaromir
    Partila, Pavol
    Rozhon, Jan
    Voznak, Miroslav
    Skapa, Jan
    Uhrin, Dominik
    Chmelikova, Zdenka
    MACHINE INTELLIGENCE AND BIO-INSPIRED COMPUTATION: THEORY AND APPLICATIONS X, 2016, 9850
  • [29] An efficient speaker recognition using quantum neural network
    Kaur, Rupinderdeep
    Sharma, R. K.
    Kumar, Parteek
    MODERN PHYSICS LETTERS B, 2018, 32 (31):
  • [30] ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION
    McLaren, Mitchell
    Lei, Yun
    Ferrer, Luciana
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4814 - 4818