TEMPORAL DYNAMIC CONVOLUTIONAL NEURAL NETWORK FOR TEXT-INDEPENDENT SPEAKER VERIFICATION AND PHONEMIC ANALYSIS

被引:13
|
作者
Kim, Seong-Hu [1 ]
Nam, Hyeonuk [1 ]
Park, Yong-Hwa [1 ]
机构
[1] Korea Adv Inst Sci & Technol, Dept Mech Engn, Daejeon, South Korea
关键词
Speaker verification; text-independent; temporal dynamic convolutional neural network; phoneme-adaptive kernel;
D O I
10.1109/ICASSP43922.2022.9747421
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In the field of text-independent speaker recognition, dynamic models that adapt along the time axis have been proposed to consider the phoneme-varying characteristics of speech. However, a detailed analysis of how dynamic models work depending on phonemes is insufficient. In this paper, we propose temporal dynamic CNN (TDY-CNN) that considers temporal variation of phonemes by applying kernels optimally adapting to each time bin. These kernels adapt to time bins by applying weighted sum of trained basis kernels. Then, an analysis of how adaptive kernels work on different phonemes in various layers is carried out. TDY-ResNet-38(x0.5) using six basis kernels improved an equal error rate (EER), the speaker verification performance, by 17.3% compared to the baseline model ResNet-38(x0.5). In addition, we showed that adaptive kernels depend on phoneme groups and are more phoneme-specific at early layers. The temporal dynamic model adapts itself to phonemes without explicitly given phoneme information during training, and results show the necessity to consider phoneme variation within utterances for more accurate and robust text-independent speaker verification.
引用
收藏
页码:6742 / 6746
页数:5
相关论文
共 50 条
  • [1] Analysis-Based Optimization of Temporal Dynamic Convolutional Neural Network for Text-Independent Speaker Verification
    Kim, Seong-Hu
    Nam, Hyeonuk
    Park, Yong-Hwa
    [J]. IEEE ACCESS, 2023, 11 : 60646 - 60659
  • [2] Text-Independent Speaker Verification Based on Triplet Convolutional Neural Network Embeddings
    Zhang, Chunlei
    Koishida, Kazuhito
    Hansen, John H. L.
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (09) : 1633 - 1644
  • [3] Adaptive Convolutional Neural Network for Text-Independent Speaker Recognition
    Kim, Seong-Hu
    Park, Yong-Hwa
    [J]. INTERSPEECH 2021, 2021, : 66 - 70
  • [4] Deep Neural Network Embeddings for Text-Independent Speaker Verification
    Snyder, David
    Garcia-Romero, Daniel
    Povey, Daniel
    Khudanpur, Sanjeev
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 999 - 1003
  • [5] Automatic text-independent speaker verification using convolutional deep belief network
    Rakhmanenko, I. A.
    Shelupanov, A. A.
    Kostyuchenko, E. Y.
    [J]. COMPUTER OPTICS, 2020, 44 (04) : 596 - +
  • [6] Deep Speaker Embeddings with Convolutional Neural Network on Supervector for Text-Independent Speaker Recognition
    Cai, Danwei
    Cai, Zexin
    Li, Ming
    [J]. 2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 1478 - 1482
  • [7] Neural Embedding Extractors for Text-Independent Speaker Verification
    Alam, Jahangir
    Kang, Woohyun
    Fathan, Abderrahim
    [J]. SPEECH AND COMPUTER, SPECOM 2022, 2022, 13721 : 10 - 23
  • [8] Text-independent speaker verification with dynamic trajectory model
    Xiang, B
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2003, 10 (05) : 141 - 143
  • [9] Text-Independent Speaker Verification with Dual Attention Network
    Li, Jingyu
    Lee, Tan
    [J]. INTERSPEECH 2020, 2020, : 956 - 960
  • [10] TEXT-INDEPENDENT SPEAKER VERIFICATION USING 3D CONVOLUTIONAL NEURAL NETWORKS
    Toifi, Amirsina
    Dawson, Jeremy
    Nasrabadi, Nasser M.
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2018,