Text-independent speaker identification based on deep Gaussian correlation supervector

被引:11
|
作者
Sun, Linhui [1 ,2 ]
Gu, Ting [1 ]
Xie, Keli [1 ]
Chen, Jia [1 ]
机构
[1] Nanjing Univ Posts & Telecommun, Coll Telecommun & Informat Engn, Nanjing, Jiangsu, Peoples R China
[2] Nanjing Univ Posts & Telecommun, Key Lab Broadband Wireless Commun & Sensor Networ, Minist Educ, Nanjing, Jiangsu, Peoples R China
基金
中国国家自然科学基金;
关键词
Gaussian mixture model; Deep neural network; Speaker identification; Bottleneck feature; Deep Gaussian correlation supervector; FEATURES;
D O I
10.1007/s10772-019-09618-5
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Great progress has been made in speaker recognition by extracting features from Gaussian mixture model (GMM) or deep neural network (DNN). In this paper, to extract the personality characteristics of speakers more accurately, we propose a novel deep Gaussian correlation supervector (DGCS) feature based on a DBN-GMM hybrid model. In the method, we firstly extract MFCC from preprocessed speech signals and employ a DBN to gain bottleneck features. Then bottleneck features are fed to a GMM to extract deep Gaussian supervector (DGS) which can be as the input of SVM achieving pattern discrimination and judgment. Further considering the relevance between deep mean vectors of DGS, DGS will be transformed to DGCS by the method of supervector recombination. Our experiments show that utilizing DGCS can significantly improve recognition rate by 17.979% compared to thesystem only with supervector, 18.22% compared to thesystem with DGS and 1.875% compared to thesystem with correlation supervector. In addition, the proposed DGCS demonstrates that time complexity for identification task can be largely reduced.
引用
收藏
页码:449 / 457
页数:9
相关论文
共 50 条
  • [1] Text-independent speaker identification based on deep Gaussian correlation supervector
    Linhui Sun
    Ting Gu
    Keli Xie
    Jia Chen
    [J]. International Journal of Speech Technology, 2019, 22 : 449 - 457
  • [2] Deep Speaker Embeddings with Convolutional Neural Network on Supervector for Text-Independent Speaker Recognition
    Cai, Danwei
    Cai, Zexin
    Li, Ming
    [J]. 2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 1478 - 1482
  • [3] Text-independent speaker identification
    Gish, Herbert
    Schmidt, Michael
    [J]. IEEE SIGNAL PROCESSING MAGAZINE, 1994, 11 (04) : 18 - 32
  • [4] A novel text-independent speaker identification method based on common Gaussian bases
    Hao, Chen
    Zhao, Rongchun
    [J]. 2005 INTERNATIONAL SYMPOSIUM ON COMPUTER SCIENCE AND TECHNOLOGY, PROCEEDINGS, 2005, : 72 - 78
  • [5] ROBUST TEXT-INDEPENDENT SPEAKER IDENTIFICATION USING GAUSSIAN MIXTURE SPEAKER MODELS
    REYNOLDS, DA
    ROSE, RC
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1995, 3 (01): : 72 - 83
  • [6] PLDA in the i-supervector space for text-independent speaker verification
    Ye Jiang
    Kong Aik Lee
    Longbiao Wang
    [J]. EURASIP Journal on Audio, Speech, and Music Processing, 2014 (1)
  • [7] PLDA in the i-supervector space for text-independent speaker verification
    Jiang, Ye
    Lee, Kong Aik
    Wang, Longbiao
    [J]. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2014, : 1 - 13
  • [8] Improved Text-Independent Speaker Identification and Verification with Gaussian Mixture Models
    Chakroun, Rania
    Frikha, Mondher
    [J]. KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, KSEM 2019, PT II, 2019, 11776 : 3 - 10
  • [9] Text-independent Speaker Identification in Birds
    Fox, E. J. S.
    Roberts, J. D.
    Bennamoun, M.
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2122 - 2125
  • [10] Dimensionality reduction for text-independent speaker identification using Gaussian Mixture Model
    El-Gamal, MA
    Abu El-Yazeed, MF
    El Ayadi, MMH
    [J]. Proceedings of the 46th IEEE International Midwest Symposium on Circuits & Systems, Vols 1-3, 2003, : 625 - 628