Prosodic-Enhanced Siamese Convolutional Neural Networks for Cross-Device Text-Independent Speaker Verification

被引:0
|
作者
Soleymani, Sobhan [1 ]
Dabouei, Ali [1 ]
Iranmanesh, Seyed Mehdi [1 ]
Kazemi, Hadi [1 ]
Dawson, Jeremy [1 ]
Nasrabadi, Nasser M. [1 ]
机构
[1] West Virginia Univ, Morgantown, WV 26506 USA
基金
美国国家科学基金会;
关键词
RECOGNITION;
D O I
暂无
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
In this paper a novel cross-device text-independent speaker verification architecture is proposed. Majority of the state-ofthe-art deep architectures that are used for speaker verification tasks consider Mel frequency cepstral coefficients. In contrast, our proposed Siamese convolutional neural network architecture uses Mel-frequency spectrogram coefficients to benefit from the dependency of the adjacent spectro-temporal features. Moreover, although spectro-temporal features have proved to he highly reliable in speaker verification models, they only represent some aspects of short-term acoustic level traits of the speaker's voice. However, the human voice consists of several linguistic levels such as acoustic, lexicon, prosody, and phonetics, that can be utilized in speaker verification models. To compensate for these inherited shortcomings in spectro-temporal features, we propose to enhance the proposed Siamese convolutional neural network architecture by deploying a multilayer perceptron network to incorporate the prosodic, jitter, and shimmer features. The proposed end to-end verification architecture performs feature extraction and verification simultaneously. This proposed architecture displays significant improvement over classical signal processing approaches and deep algorithms for forensic cross device speaker verification.
引用
收藏
页数:7
相关论文
共 47 条
  • [1] Prosodic-enhanced siamese convolutional neural networks for cross-device text-independent speaker verification
    Soleymani, Sobhan
    Dabouei, Ali
    Iranmanesh, Seyed Mehdi
    Kazemi, Hadi
    Dawson, Jeremy
    Nasrabadi, Nasser M.
    [J]. arXiv, 2018,
  • [2] TEXT-INDEPENDENT SPEAKER VERIFICATION USING 3D CONVOLUTIONAL NEURAL NETWORKS
    Toifi, Amirsina
    Dawson, Jeremy
    Nasrabadi, Nasser M.
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2018,
  • [3] Text-independent speaker verification using predictive neural networks
    Finan, RA
    Sapeluk, AT
    Damper, RI
    [J]. FIFTH INTERNATIONAL CONFERENCE ON ARTIFICIAL NEURAL NETWORKS, 1997, (440): : 274 - 279
  • [4] Text-Independent Speaker Verification Using Lightweight 3D Convolutional Neural Networks
    Chen, Jyun-Yan
    Jeng, Jin-Tsong
    [J]. 2024 INTERNATIONAL CONFERENCE ON SYSTEM SCIENCE AND ENGINEERING, ICSSE 2024, 2024,
  • [5] Text-Independent Speaker Verification Based on Triplet Convolutional Neural Network Embeddings
    Zhang, Chunlei
    Koishida, Kazuhito
    Hansen, John H. L.
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (09) : 1633 - 1644
  • [6] Text-Independent Speaker Identification Using Formants and Convolutional Neural Networks
    Camarena-Ibarrola, Antonio
    Reynoso, Miguel
    Figueroa, Karina
    [J]. ADVANCES IN SOFT COMPUTING (MICAI 2021), PT II, 2021, 13068 : 108 - 119
  • [7] Neural Embedding Extractors for Text-Independent Speaker Verification
    Alam, Jahangir
    Kang, Woohyun
    Fathan, Abderrahim
    [J]. SPEECH AND COMPUTER, SPECOM 2022, 2022, 13721 : 10 - 23
  • [8] Evolutionary Algorithm Enhanced Neural Architecture Search for Text-Independent Speaker Verification
    Qu, Xiaoyang
    Wang, Jianzong
    Xiao, Jing
    [J]. INTERSPEECH 2020, 2020, : 961 - 965
  • [9] TEMPORAL DYNAMIC CONVOLUTIONAL NEURAL NETWORK FOR TEXT-INDEPENDENT SPEAKER VERIFICATION AND PHONEMIC ANALYSIS
    Kim, Seong-Hu
    Nam, Hyeonuk
    Park, Yong-Hwa
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6742 - 6746
  • [10] Generalized locally recurrent probabilistic neural networks for text-independent speaker verification
    Ganchev, T
    Fakotakis, N
    Tasoulis, DK
    Vrahatis, MN
    [J]. 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 41 - 44