An optimum end-to-end text-independent speaker identification system using convolutional neural network

被引:13
|
作者
Farsiani, Shabnam [1 ]
Izadkhah, Habib [1 ]
Lotfi, Shahriar [1 ]
机构
[1] Univ Tabriz, Fac Math Stat & Comp Sci, Dept Comp Sci, Tabriz, Iran
关键词
Speaker identification; Text-independent; Convolutional neural network; Log-mel spectrogram; Data augmentation; FEATURES;
D O I
10.1016/j.compeleceng.2022.107882
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, convolutional neural networks (CNNs) have outperformed conventional methods in end-to-end speaker identification (SI) systems. The CNN training time is considerably long due to the need for large amounts of training data and high costs of computation and memory consumption. This paper proposes a new CNN for text-independent SI inspired by the VGG-13 architecture with fewer parameters but an acceptable accuracy. In addition to the proposed CNN, the time complexity and memory cost of network training can be reduced through offline feature extraction by using a short segment of each audio sample and online data augmentation. According to the results on Voxceleb1, the proposed system is more accurate than the other state-of-the-art methods in SI. Therefore, the proposed CNN improved the accuracy and decreased the training time.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Hybrid Network For End-To-End Text-Independent Speaker Identification
    Ghezaiel, Wajdi
    Brun, Luc
    Lezoray, Olivier
    [J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 2352 - 2359
  • [2] An End-to-End Text-Independent Speaker Identification System on Short Utterances
    Ji, Ruifang
    Cai, Xinyuan
    Xu, Bo
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3628 - 3632
  • [3] Strategies for End-to-End Text-Independent Speaker Verification
    Lin, Weiwei
    Mak, Man-Wai
    Chien, Jen-Tzung
    [J]. INTERSPEECH 2020, 2020, : 4308 - 4312
  • [4] RawNet: Advanced end-to-end deep neural network using raw waveforms for text-independent speaker verification
    Jung, Jee-weon
    Heo, Hee-Soo
    Kim, Ju-ho
    Shim, Hye-jin
    Yu, Ha-Jin
    [J]. INTERSPEECH 2019, 2019, : 1268 - 1272
  • [5] An End-to-End Text-independent Speaker Verification Framework with a Keyword Adversarial Network
    Yun, Sungrack
    Cho, Janghoon
    Eum, Jungyun
    Chang, Wonil
    Hwang, Kyuwoong
    [J]. INTERSPEECH 2019, 2019, : 2923 - 2927
  • [6] End-to-End Feature Learning for Text-Independent Speaker Verification
    Chen, Fangzhou
    Bian, Tengyue
    Xu, Li
    [J]. PROCEEDINGS OF THE 2019 31ST CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2019), 2019, : 3949 - 3954
  • [7] DeepWriterID: An End-to-End Online Text-Independent Writer Identification System
    Yang, Weixin
    Jin, Lianwen
    Liu, Manfei
    [J]. IEEE INTELLIGENT SYSTEMS, 2016, 31 (02) : 45 - 53
  • [8] END-TO-END TEXT-INDEPENDENT SPEAKER VERIFICATION WITH FLEXIBILITY IN UTTERANCE DURATION
    Zhang, Chunlei
    Koishida, Kazuhito
    [J]. 2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 584 - 590
  • [9] Avoiding Speaker Overfitting in End-to-End DNNs using Raw Waveform for Text-Independent Speaker Verification
    Jung, Jee-Weon
    Heo, Hee-Soo
    Yang, Il-Ho
    Shim, Hye-Jin
    Yu, Ha-Jin
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3583 - 3587
  • [10] Text-Independent Speaker Identification Using Formants and Convolutional Neural Networks
    Camarena-Ibarrola, Antonio
    Reynoso, Miguel
    Figueroa, Karina
    [J]. ADVANCES IN SOFT COMPUTING (MICAI 2021), PT II, 2021, 13068 : 108 - 119