An optimum end-to-end text-independent speaker identification system using convolutional neural network

被引：13

作者：

Farsiani, Shabnam ^{[1
]}

Izadkhah, Habib ^{[1
]}

Lotfi, Shahriar ^{[1
]}

机构：

[1] Univ Tabriz, Fac Math Stat & Comp Sci, Dept Comp Sci, Tabriz, Iran

来源：

COMPUTERS & ELECTRICAL ENGINEERING | 2022年 / 100卷

关键词：

Speaker identification; Text-independent; Convolutional neural network; Log-mel spectrogram; Data augmentation; FEATURES;

D O I：

10.1016/j.compeleceng.2022.107882

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

In recent years, convolutional neural networks (CNNs) have outperformed conventional methods in end-to-end speaker identification (SI) systems. The CNN training time is considerably long due to the need for large amounts of training data and high costs of computation and memory consumption. This paper proposes a new CNN for text-independent SI inspired by the VGG-13 architecture with fewer parameters but an acceptable accuracy. In addition to the proposed CNN, the time complexity and memory cost of network training can be reduced through offline feature extraction by using a short segment of each audio sample and online data augmentation. According to the results on Voxceleb1, the proposed system is more accurate than the other state-of-the-art methods in SI. Therefore, the proposed CNN improved the accuracy and decreased the training time.

引用

页数：11

共 50 条

[1] Hybrid Network For End-To-End Text-Independent Speaker Identification
Ghezaiel, Wajdi
Brun, Luc
Lezoray, Olivier
[J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 2352 - 2359
[2] An End-to-End Text-Independent Speaker Identification System on Short Utterances
Ji, Ruifang
Cai, Xinyuan
Xu, Bo
[J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3628 - 3632
[3] Strategies for End-to-End Text-Independent Speaker Verification
Lin, Weiwei
Mak, Man-Wai
Chien, Jen-Tzung
[J]. INTERSPEECH 2020, 2020, : 4308 - 4312
[4] RawNet: Advanced end-to-end deep neural network using raw waveforms for text-independent speaker verification
Jung, Jee-weon
Heo, Hee-Soo
Kim, Ju-ho
Shim, Hye-jin
Yu, Ha-Jin
[J]. INTERSPEECH 2019, 2019, : 1268 - 1272
[5] An End-to-End Text-independent Speaker Verification Framework with a Keyword Adversarial Network
Yun, Sungrack
Cho, Janghoon
Eum, Jungyun
Chang, Wonil
Hwang, Kyuwoong
[J]. INTERSPEECH 2019, 2019, : 2923 - 2927
[6] End-to-End Feature Learning for Text-Independent Speaker Verification
Chen, Fangzhou
Bian, Tengyue
Xu, Li
[J]. PROCEEDINGS OF THE 2019 31ST CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2019), 2019, : 3949 - 3954
[7] DeepWriterID: An End-to-End Online Text-Independent Writer Identification System
Yang, Weixin
Jin, Lianwen
Liu, Manfei
[J]. IEEE INTELLIGENT SYSTEMS, 2016, 31 (02) : 45 - 53
[8] END-TO-END TEXT-INDEPENDENT SPEAKER VERIFICATION WITH FLEXIBILITY IN UTTERANCE DURATION
Zhang, Chunlei
Koishida, Kazuhito
[J]. 2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 584 - 590
[9] Avoiding Speaker Overfitting in End-to-End DNNs using Raw Waveform for Text-Independent Speaker Verification
Jung, Jee-Weon
Heo, Hee-Soo
Yang, Il-Ho
Shim, Hye-Jin
Yu, Ha-Jin
[J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3583 - 3587
[10] Text-Independent Speaker Identification Using Formants and Convolutional Neural Networks
Camarena-Ibarrola, Antonio
Reynoso, Miguel
Figueroa, Karina
[J]. ADVANCES IN SOFT COMPUTING (MICAI 2021), PT II, 2021, 13068 : 108 - 119

← 1 2 3 4 5 →