TEMPORAL DYNAMIC CONVOLUTIONAL NEURAL NETWORK FOR TEXT-INDEPENDENT SPEAKER VERIFICATION AND PHONEMIC ANALYSIS

被引:13
|
作者
Kim, Seong-Hu [1 ]
Nam, Hyeonuk [1 ]
Park, Yong-Hwa [1 ]
机构
[1] Korea Adv Inst Sci & Technol, Dept Mech Engn, Daejeon, South Korea
关键词
Speaker verification; text-independent; temporal dynamic convolutional neural network; phoneme-adaptive kernel;
D O I
10.1109/ICASSP43922.2022.9747421
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In the field of text-independent speaker recognition, dynamic models that adapt along the time axis have been proposed to consider the phoneme-varying characteristics of speech. However, a detailed analysis of how dynamic models work depending on phonemes is insufficient. In this paper, we propose temporal dynamic CNN (TDY-CNN) that considers temporal variation of phonemes by applying kernels optimally adapting to each time bin. These kernels adapt to time bins by applying weighted sum of trained basis kernels. Then, an analysis of how adaptive kernels work on different phonemes in various layers is carried out. TDY-ResNet-38(x0.5) using six basis kernels improved an equal error rate (EER), the speaker verification performance, by 17.3% compared to the baseline model ResNet-38(x0.5). In addition, we showed that adaptive kernels depend on phoneme groups and are more phoneme-specific at early layers. The temporal dynamic model adapts itself to phonemes without explicitly given phoneme information during training, and results show the necessity to consider phoneme variation within utterances for more accurate and robust text-independent speaker verification.
引用
收藏
页码:6742 / 6746
页数:5
相关论文
共 50 条
  • [31] Collaborative and adversarial network for text-independent speaker verification in domain adaptation
    Qiang, Junhao
    Yang, Qun
    Gao, Jie
    Liu, Shaohan
    [J]. ELECTRONICS LETTERS, 2023, 59 (02)
  • [32] Neural network clustering technique for text-independent speaker identification
    Nossair, Zaki B.
    Zahorian, Stephen A.
    [J]. Artificial Neural Networks in Engineering - Proceedings (ANNIE'94), 1994, 4 : 453 - 459
  • [33] Deeply Fused Speaker Embeddings for Text-Independent Speaker Verification
    Bhattacharya, Gautam
    Alam, Jahangir
    Gupta, Vishwa
    Kenny, Patrick
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3588 - 3592
  • [34] Deep Speaker Feature Learning for Text-independent Speaker Verification
    Li, Lantian
    Chen, Yixiang
    Shi, Zing
    Tang, Zhiyuan
    Wang, Dong
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1542 - 1546
  • [35] Prosodic-Enhanced Siamese Convolutional Neural Networks for Cross-Device Text-Independent Speaker Verification
    Soleymani, Sobhan
    Dabouei, Ali
    Iranmanesh, Seyed Mehdi
    Kazemi, Hadi
    Dawson, Jeremy
    Nasrabadi, Nasser M.
    [J]. 2018 IEEE 9TH INTERNATIONAL CONFERENCE ON BIOMETRICS THEORY, APPLICATIONS AND SYSTEMS (BTAS), 2018,
  • [36] A Survey on Text-Dependent and Text-Independent Speaker Verification
    Tu, Youzhi
    Lin, Weiwei
    Mak, Man-Wai
    [J]. IEEE ACCESS, 2022, 10 : 99038 - 99049
  • [37] Prosodic-enhanced siamese convolutional neural networks for cross-device text-independent speaker verification
    Soleymani, Sobhan
    Dabouei, Ali
    Iranmanesh, Seyed Mehdi
    Kazemi, Hadi
    Dawson, Jeremy
    Nasrabadi, Nasser M.
    [J]. arXiv, 2018,
  • [38] Generalized locally recurrent probabilistic neural networks for text-independent speaker verification
    Ganchev, T
    Fakotakis, N
    Tasoulis, DK
    Vrahatis, MN
    [J]. 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 41 - 44
  • [39] Evolutionary Algorithm Enhanced Neural Architecture Search for Text-Independent Speaker Verification
    Qu, Xiaoyang
    Wang, Jianzong
    Xiao, Jing
    [J]. INTERSPEECH 2020, 2020, : 961 - 965
  • [40] An optimum end-to-end text-independent speaker identification system using convolutional neural network
    Farsiani, Shabnam
    Izadkhah, Habib
    Lotfi, Shahriar
    [J]. COMPUTERS & ELECTRICAL ENGINEERING, 2022, 100