Deep Speaker Embeddings with Convolutional Neural Network on Supervector for Text-Independent Speaker Recognition

被引:0
|
作者
Cai, Danwei [1 ]
Cai, Zexin [1 ]
Li, Ming [1 ]
机构
[1] Duke Kunshan Univ, Data Sci Res Ctr, Kunshan, Jiangsu, Peoples R China
基金
中国国家自然科学基金;
关键词
Speaker verification; text-independent; CNN; supervector; deep speaker embedding; PHONETIC TOKENIZATIONS; VERIFICATION;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Lexical content variability in different utterances is the key challenge for text-independent speaker verification. In this paper, we investigate using supervector which has ability to reduce the impact of lexical content mismatch among different utterances for supervised speaker embedding learning. A DNN acoustic model is used to align a feature sequence to a set of senones and generate centered and normalized first order statistics supervector. Statistics vectors from similar senones are placed together and reshaped to an image to maintain the local continuity and correlation. The supervector image is then fed into residual convolutional neural network. The deep speaker embedding features are the outputs of the last hidden layer of the network and we employ a PLDA back-end for the subsequent modeling. Experimental results show that the proposed method outperforms the conventional GMM-UBM i-vector system and is complementary to the DNN-UBM i-vector system. The score level fusion system achieves 1.26% ERR and 0.260 DCF10 cost on the NIST SRE 10 extended core condition 5 task.
引用
收藏
页码:1478 / 1482
页数:5
相关论文
共 50 条
  • [1] Deep Neural Network Embeddings for Text-Independent Speaker Verification
    Snyder, David
    Garcia-Romero, Daniel
    Povey, Daniel
    Khudanpur, Sanjeev
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 999 - 1003
  • [2] Adaptive Convolutional Neural Network for Text-Independent Speaker Recognition
    Kim, Seong-Hu
    Park, Yong-Hwa
    [J]. INTERSPEECH 2021, 2021, : 66 - 70
  • [3] Text-Independent Speaker Verification Based on Triplet Convolutional Neural Network Embeddings
    Zhang, Chunlei
    Koishida, Kazuhito
    Hansen, John H. L.
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (09) : 1633 - 1644
  • [4] Deep Neural Network Embeddings with Gating Mechanisms for Text-Independent Speaker Verification
    You, Lanhua
    Guo, Wu
    Dai, Li-Rong
    Du, Jun
    [J]. INTERSPEECH 2019, 2019, : 1168 - 1172
  • [5] Modified layer deep convolution neural network for text-independent speaker recognition
    Karthikeyan, V
    Priyadharsini, Suja S.
    [J]. JOURNAL OF EXPERIMENTAL & THEORETICAL ARTIFICIAL INTELLIGENCE, 2024, 36 (02) : 273 - 285
  • [6] Text-independent speaker identification based on deep Gaussian correlation supervector
    Sun, Linhui
    Gu, Ting
    Xie, Keli
    Chen, Jia
    [J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2019, 22 (02) : 449 - 457
  • [7] Text-independent speaker identification based on deep Gaussian correlation supervector
    Linhui Sun
    Ting Gu
    Keli Xie
    Jia Chen
    [J]. International Journal of Speech Technology, 2019, 22 : 449 - 457
  • [8] TEXT-INDEPENDENT SPEAKER RECOGNITION
    ATAL, BS
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1972, 52 (01): : 181 - &
  • [9] Deeply Fused Speaker Embeddings for Text-Independent Speaker Verification
    Bhattacharya, Gautam
    Alam, Jahangir
    Gupta, Vishwa
    Kenny, Patrick
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3588 - 3592