A deep learning approach to integrate convolutional neural networks in speaker recognition

被引:0
|
作者
Soufiane Hourri
Nikola S. Nikolov
Jamal Kharroubi
机构
[1] Université Sidi Mohamed Ben Abdellah,Faculté des Sciences et Techniques, Laboratoire des Systèmes Intelligents et Applications
[2] University of Limerick,undefined
关键词
Speaker recognition; MFCC; Convolutional neural network; Restricted Boltzmann Machine; Deep learning;
D O I
暂无
中图分类号
学科分类号
摘要
We propose a novel usage of convolutional neural networks (CNNs) for the problem of speaker recognition. While being particularly designed for computer vision problems, CNNs have recently been applied for speaker recognition by using spectrograms as input images. We believe that this approach is not optimal as it may result in two cumulative errors in solving both a computer vision and a speaker recognition problem. In this work, we aim at integrating CNNs in speaker recognition without relying on images. We use Restricted Boltzmann Machines (RBMs) to extract speakers models as matrices and introduce a new way to model target and non-target speakers, in order to perform speaker verification. Thus, we use a CNN to discriminate between target and non-target matrices. Experiments were conducted with the THUYG-20 SRE corpus under three noise conditions: clean, 9 db, and 0 db. The results demonstrate that our method outperforms the state-of-the-art approaches by decreasing the error rate by up to 60%.
引用
收藏
页码:615 / 623
页数:8
相关论文
共 50 条
  • [21] When Face Recognition Meets with Deep Learning: an Evaluation of Convolutional Neural Networks for Face Recognition
    Hu, Guosheng
    Yang, Yongxin
    Yi, Dong
    Kittler, Josef
    Christmas, William
    Li, Stan Z.
    Hospedales, Timothy
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOP (ICCVW), 2015, : 384 - 392
  • [22] Learning Deep Binaural Representations With Deep Convolutional Neural Networks for Spontaneous Speech Emotion Recognition
    Zhang, Shiqing
    Chen, Aihua
    Guo, Wenping
    Cui, Yueli
    Zhao, Xiaoming
    Liu, Limei
    [J]. IEEE ACCESS, 2020, 8 : 23496 - 23505
  • [23] JOINT SPEAKER DIARIZATION AND RECOGNITION USING CONVOLUTIONAL AND RECURRENT NEURAL NETWORKS
    Zhou, Zhihan
    Zhang, Yichi
    Duan, Zhiyao
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 2496 - 2500
  • [24] Speaker Recognition Using Constrained Convolutional Neural Networks in Emotional Speech
    Simic, Nikola
    Suzic, Sinisa
    Nosek, Tijana
    Vujovic, Mia
    Peric, Zoran
    Savic, Milan
    Delic, Vlado
    [J]. ENTROPY, 2022, 24 (03)
  • [25] Deep Speaker Embeddings with Convolutional Neural Network on Supervector for Text-Independent Speaker Recognition
    Cai, Danwei
    Cai, Zexin
    Li, Ming
    [J]. 2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 1478 - 1482
  • [26] Deep Convolutional Neural Networks for Facial Expression Recognition
    Ucar, Aysegul
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON INNOVATIONS IN INTELLIGENT SYSTEMS AND APPLICATIONS (INISTA), 2017, : 371 - 375
  • [27] Race Recognition Using Deep Convolutional Neural Networks
    Thanh Vo
    Trang Nguyen
    Le, C. T.
    [J]. SYMMETRY-BASEL, 2018, 10 (11):
  • [28] Robustness of Deep Convolutional Neural Networks for Image Recognition
    Ulicny, Matej
    Lundstrom, Jens
    Byttner, Stefan
    [J]. INTELLIGENT COMPUTING SYSTEMS, 2016, 597 : 16 - 30
  • [29] Deep convolutional neural networks for regular texture recognition
    Liu, Ni
    Rogers, Mitchell
    Cui, Hua
    Liu, Weiyu
    Li, Xizhi
    Delmas, Patrice
    [J]. PEERJ COMPUTER SCIENCE, 2022, 8
  • [30] Deep Convolutional Neural Networks for Sign Language Recognition
    Rao, G. Anantha
    Syamala, K.
    Kishore, P. V. V.
    Sastry, A. S. C. S.
    [J]. 2018 CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATION ENGINEERING SYSTEMS (SPACES), 2018, : 194 - 197