Text-independent speaker identification based on deep Gaussian correlation supervector

被引:11
|
作者
Sun, Linhui [1 ,2 ]
Gu, Ting [1 ]
Xie, Keli [1 ]
Chen, Jia [1 ]
机构
[1] Nanjing Univ Posts & Telecommun, Coll Telecommun & Informat Engn, Nanjing, Jiangsu, Peoples R China
[2] Nanjing Univ Posts & Telecommun, Key Lab Broadband Wireless Commun & Sensor Networ, Minist Educ, Nanjing, Jiangsu, Peoples R China
基金
中国国家自然科学基金;
关键词
Gaussian mixture model; Deep neural network; Speaker identification; Bottleneck feature; Deep Gaussian correlation supervector; FEATURES;
D O I
10.1007/s10772-019-09618-5
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Great progress has been made in speaker recognition by extracting features from Gaussian mixture model (GMM) or deep neural network (DNN). In this paper, to extract the personality characteristics of speakers more accurately, we propose a novel deep Gaussian correlation supervector (DGCS) feature based on a DBN-GMM hybrid model. In the method, we firstly extract MFCC from preprocessed speech signals and employ a DBN to gain bottleneck features. Then bottleneck features are fed to a GMM to extract deep Gaussian supervector (DGS) which can be as the input of SVM achieving pattern discrimination and judgment. Further considering the relevance between deep mean vectors of DGS, DGS will be transformed to DGCS by the method of supervector recombination. Our experiments show that utilizing DGCS can significantly improve recognition rate by 17.979% compared to thesystem only with supervector, 18.22% compared to thesystem with DGS and 1.875% compared to thesystem with correlation supervector. In addition, the proposed DGCS demonstrates that time complexity for identification task can be largely reduced.
引用
收藏
页码:449 / 457
页数:9
相关论文
共 50 条
  • [41] TEXT-INDEPENDENT SPEAKER RECOGNITION
    ATAL, BS
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1972, 52 (01): : 181 - &
  • [42] Frame level likelihood normalization for text-independent speaker identification using Gaussian Mixture Models
    Markov, K
    Nakagawa, S
    [J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1764 - 1767
  • [43] On Metric-based Deep Embedding Learning for Text-Independent Speaker Verification
    Kashani, Hamidreza Baradaran
    Reza, Shaghayegh
    Rezaei, Iman Sarraf
    [J]. 2020 6TH IRANIAN CONFERENCE ON SIGNAL PROCESSING AND INTELLIGENT SYSTEMS (ICSPIS), 2020,
  • [44] Context-adaptive Gaussian Attention for Text-independent Speaker Verification
    Peng, Junyi
    Gu, Rongzhi
    Zhang, Haoran
    Zou, Yuexian
    [J]. 2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 595 - 599
  • [45] An MFCC-based text-independent speaker identification system for access control
    Liu, Jung-Chun
    Leu, Fang-Yie
    Lin, Guan-Liang
    Susanto, Heru
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2018, 30 (02):
  • [46] Fuzzy training algorithm for wavelet codebook based text-independent speaker identification
    Lung, SY
    [J]. IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2005, E88A (06) : 1619 - 1621
  • [47] Text-Independent Speaker Verification Using Variational Gaussian Mixture Model
    Moattar, Mohammad Hossein
    Homayounpour, Mohammad Mehdi
    [J]. ETRI JOURNAL, 2011, 33 (06) : 914 - 923
  • [48] Deep Neural Network Embeddings for Text-Independent Speaker Verification
    Snyder, David
    Garcia-Romero, Daniel
    Povey, Daniel
    Khudanpur, Sanjeev
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 999 - 1003
  • [49] Higher order information set based features for text-independent speaker identification
    Medikonda J.
    Madasu H.
    [J]. International Journal of Speech Technology, 2018, 21 (03) : 451 - 461
  • [50] Hybridization Process for Text-Independent Speaker Identification Based on Vector Quantization Model
    Djeghader, Mohammed
    Huang, Qin
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING (ICSIP), 2016, : 596 - 601