A deep learning approach to integrate convolutional neural networks in speaker recognition

被引：0

作者：

Soufiane Hourri

Nikola S. Nikolov

Jamal Kharroubi

机构：

[1] Université Sidi Mohamed Ben Abdellah,Faculté des Sciences et Techniques, Laboratoire des Systèmes Intelligents et Applications

[2] University of Limerick,undefined

来源：

International Journal of Speech Technology | 2020年 / 23卷

关键词：

Speaker recognition; MFCC; Convolutional neural network; Restricted Boltzmann Machine; Deep learning;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

We propose a novel usage of convolutional neural networks (CNNs) for the problem of speaker recognition. While being particularly designed for computer vision problems, CNNs have recently been applied for speaker recognition by using spectrograms as input images. We believe that this approach is not optimal as it may result in two cumulative errors in solving both a computer vision and a speaker recognition problem. In this work, we aim at integrating CNNs in speaker recognition without relying on images. We use Restricted Boltzmann Machines (RBMs) to extract speakers models as matrices and introduce a new way to model target and non-target speakers, in order to perform speaker verification. Thus, we use a CNN to discriminate between target and non-target matrices. Experiments were conducted with the THUYG-20 SRE corpus under three noise conditions: clean, 9 db, and 0 db. The results demonstrate that our method outperforms the state-of-the-art approaches by decreasing the error rate by up to 60%.

引用

页码：615 / 623

页数：8

共 50 条

[21] When Face Recognition Meets with Deep Learning: an Evaluation of Convolutional Neural Networks for Face Recognition
Hu, Guosheng
Yang, Yongxin
Yi, Dong
Kittler, Josef
Christmas, William
Li, Stan Z.
Hospedales, Timothy
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOP (ICCVW), 2015, : 384 - 392
[22] Learning Deep Binaural Representations With Deep Convolutional Neural Networks for Spontaneous Speech Emotion Recognition
Zhang, Shiqing
Chen, Aihua
Guo, Wenping
Cui, Yueli
Zhao, Xiaoming
Liu, Limei
[J]. IEEE ACCESS, 2020, 8 : 23496 - 23505
[23] JOINT SPEAKER DIARIZATION AND RECOGNITION USING CONVOLUTIONAL AND RECURRENT NEURAL NETWORKS
Zhou, Zhihan
Zhang, Yichi
Duan, Zhiyao
[J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 2496 - 2500
[24] Speaker Recognition Using Constrained Convolutional Neural Networks in Emotional Speech
Simic, Nikola
Suzic, Sinisa
Nosek, Tijana
Vujovic, Mia
Peric, Zoran
Savic, Milan
Delic, Vlado
[J]. ENTROPY, 2022, 24 (03)
[25] Deep Speaker Embeddings with Convolutional Neural Network on Supervector for Text-Independent Speaker Recognition
Cai, Danwei
Cai, Zexin
Li, Ming
[J]. 2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 1478 - 1482
[26] Deep Convolutional Neural Networks for Facial Expression Recognition
Ucar, Aysegul
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON INNOVATIONS IN INTELLIGENT SYSTEMS AND APPLICATIONS (INISTA), 2017, : 371 - 375
[27] Race Recognition Using Deep Convolutional Neural Networks
Thanh Vo
Trang Nguyen
Le, C. T.
[J]. SYMMETRY-BASEL, 2018, 10 (11):
[28] Robustness of Deep Convolutional Neural Networks for Image Recognition
Ulicny, Matej
Lundstrom, Jens
Byttner, Stefan
[J]. INTELLIGENT COMPUTING SYSTEMS, 2016, 597 : 16 - 30
[29] Deep convolutional neural networks for regular texture recognition
Liu, Ni
Rogers, Mitchell
Cui, Hua
Liu, Weiyu
Li, Xizhi
Delmas, Patrice
[J]. PEERJ COMPUTER SCIENCE, 2022, 8
[30] Deep Convolutional Neural Networks for Sign Language Recognition
Rao, G. Anantha
Syamala, K.
Kishore, P. V. V.
Sastry, A. S. C. S.
[J]. 2018 CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATION ENGINEERING SYSTEMS (SPACES), 2018, : 194 - 197

← 1 2 3 4 5 →