Convolutional neural network vectors for speaker recognition

被引:0
|
作者
Soufiane Hourri
Nikola S. Nikolov
Jamal Kharroubi
机构
[1] Laboratoire des Systèmes Intelligents et Applications,
[2] Faculté des Sciences et Techniques,undefined
[3] Université Sidi Mohamed Ben Abdellah,undefined
[4] University of Limerick,undefined
关键词
Speaker recognition; MFCC; Convolutional neural network; Restricted Boltzmann machine; Deep learning;
D O I
暂无
中图分类号
学科分类号
摘要
Deep learning models are now considered state-of-the-art in many areas of pattern recognition. In speaker recognition, several architectures have been studied, such as deep neural networks (DNNs), deep belief networks (DBNs), restricted Boltzmann machines (RBMs), and so on, while convolutional neural networks (CNNs) are the most widely used models in computer vision. The problem is that CNN is limited to the computer vision field due to its structure which is designed for two-dimensional data. To overcome this limitation, we aim at developing a customized CNN for speaker recognition. The goal of this paper is to propose a new approach to extract speaker characteristics by constructing CNN filters linked to the speaker. Besides, we propose new vectors to identify speakers, which we call in this work convVectors. Experiments have been performed with a gender-dependent corpus (THUYG-20 SRE) under three noise conditions : clean, 9db, and 0db. We compared the proposed method with our baseline system and the state-of-the-art methods. Results showed that the convVectors method was the most robust, improving the baseline system by an average of 43%, and recording an equal error rate of 1.05% EER. This is an important finding to understand how deep learning models can be adapted to the problem of speaker recognition.
引用
收藏
页码:389 / 400
页数:11
相关论文
共 50 条
  • [1] Convolutional neural network vectors for speaker recognition
    Hourri, Soufiane
    Nikolov, Nikola S.
    Kharroubi, Jamal
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2021, 24 (02) : 389 - 400
  • [2] Empowering Speaker Verification with Deep Convolutional Neural Network Vectors
    Hourri, Soufiane
    STUDIES IN INFORMATICS AND CONTROL, 2024, 33 (02): : 97 - 107
  • [3] Speaker Adaptation of Convolutional Neural Network using Speaker Specific Subspace Vectors of SGMM
    Karthick, Murali B.
    Kolhar, Prateek
    Umesh, S.
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1096 - 1100
  • [4] Robust speaker recognition method based on convolutional neural network
    Zeng C.
    Ma C.
    Wang Z.
    Kong X.
    Huazhong Keji Daxue Xuebao (Ziran Kexue Ban)/Journal of Huazhong University of Science and Technology (Natural Science Edition), 2020, 48 (06): : 39 - 44
  • [5] Adaptive Convolutional Neural Network for Text-Independent Speaker Recognition
    Kim, Seong-Hu
    Park, Yong-Hwa
    INTERSPEECH 2021, 2021, : 66 - 70
  • [6] Deep Speaker Embeddings with Convolutional Neural Network on Supervector for Text-Independent Speaker Recognition
    Cai, Danwei
    Cai, Zexin
    Li, Ming
    2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 1478 - 1482
  • [7] Rapid and Effective Speaker Adaptation of Convolutional Neural Network Based Models for Speech Recognition
    Abdel-Hamid, Ossama
    Jiang, Hui
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1247 - 1251
  • [8] Speaker recognition using convolutional siamese neural networks
    Jung H.
    Yoon S.
    Park N.
    Transactions of the Korean Institute of Electrical Engineers, 2020, 69 (01): : 164 - 169
  • [9] Speaker Recognition using Convolutional Neural Network with Minimal Training Data for Smart Home Solutions
    Wang, Mingshan
    Sirlapu, Tejaswini
    Kwasniewska, Alicja
    Szankin, Maciej
    Bartscherer, Marko
    Nicolas, Rey
    2018 11TH INTERNATIONAL CONFERENCE ON HUMAN SYSTEM INTERACTION (HSI), 2018, : 139 - 145
  • [10] Improved Gender Independent Speaker Recognition Using Convolutional Neural Network Based Bottleneck Features
    Ranjan, Shivesh
    Hansen, John H. L.
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1009 - 1013