TEMPORAL DYNAMIC CONVOLUTIONAL NEURAL NETWORK FOR TEXT-INDEPENDENT SPEAKER VERIFICATION AND PHONEMIC ANALYSIS

被引:13
|
作者
Kim, Seong-Hu [1 ]
Nam, Hyeonuk [1 ]
Park, Yong-Hwa [1 ]
机构
[1] Korea Adv Inst Sci & Technol, Dept Mech Engn, Daejeon, South Korea
关键词
Speaker verification; text-independent; temporal dynamic convolutional neural network; phoneme-adaptive kernel;
D O I
10.1109/ICASSP43922.2022.9747421
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In the field of text-independent speaker recognition, dynamic models that adapt along the time axis have been proposed to consider the phoneme-varying characteristics of speech. However, a detailed analysis of how dynamic models work depending on phonemes is insufficient. In this paper, we propose temporal dynamic CNN (TDY-CNN) that considers temporal variation of phonemes by applying kernels optimally adapting to each time bin. These kernels adapt to time bins by applying weighted sum of trained basis kernels. Then, an analysis of how adaptive kernels work on different phonemes in various layers is carried out. TDY-ResNet-38(x0.5) using six basis kernels improved an equal error rate (EER), the speaker verification performance, by 17.3% compared to the baseline model ResNet-38(x0.5). In addition, we showed that adaptive kernels depend on phoneme groups and are more phoneme-specific at early layers. The temporal dynamic model adapts itself to phonemes without explicitly given phoneme information during training, and results show the necessity to consider phoneme variation within utterances for more accurate and robust text-independent speaker verification.
引用
收藏
页码:6742 / 6746
页数:5
相关论文
共 50 条
  • [21] Acoustic Feature Shuffling Network for Text-Independent Speaker Verification
    Li, Jin
    Fang, Xin
    Chu, Fan
    Gao, Tian
    Song, Yan
    Dai, Lirong
    [J]. INTERSPEECH 2022, 2022, : 4790 - 4794
  • [22] Efficient text-independent speaker verification with structural Gaussian mixture models and neural network
    Xiang, B
    Berger, T
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2003, 11 (05): : 447 - 456
  • [23] Online text-independent speaker verification system using Autoassociative Neural Network models
    Kishore, SP
    Yegnanarayana, B
    Gangashetty, SV
    [J]. IJCNN'01: INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-4, PROCEEDINGS, 2001, : 1548 - 1553
  • [24] Text-Independent Speaker Identification Using Formants and Convolutional Neural Networks
    Camarena-Ibarrola, Antonio
    Reynoso, Miguel
    Figueroa, Karina
    [J]. ADVANCES IN SOFT COMPUTING (MICAI 2021), PT II, 2021, 13068 : 108 - 119
  • [25] Language dependency in text-independent speaker verification
    Auckenthaler, R
    Carey, MJ
    Mason, JSD
    [J]. 2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING, 2001, : 441 - 444
  • [26] Graphical models for text-independent speaker verification
    Sánchez-Soto, E
    Sigelle, M
    Chollet, G
    [J]. NONLINEAR SPEECH MODELING AND APPLICATIONS, 2005, 3445 : 410 - 415
  • [27] Wavelet entropy and neural network for text-independent speaker identification
    Daqrouq, Khaled
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2011, 24 (05) : 796 - 802
  • [28] ORTHOGONAL TRAINING FOR TEXT-INDEPENDENT SPEAKER VERIFICATION
    Zhu, Yingke
    Mak, Brian
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6584 - 6588
  • [29] Collaborative and adversarial network for text-independent speaker verification in domain adaptation
    Qiang, Junhao
    Yang, Qun
    Gao, Jie
    Liu, Shaohan
    [J]. ELECTRONICS LETTERS, 2023, 59 (02)
  • [30] Text-independent speaker verification in embedded environments
    Tydlitat, Borivoj
    Navratil, Jiri
    Pelecanos, Jason W.
    Ramaswamy, Ganesh N.
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 293 - +