Covariance Based Deep Feature for Text-Dependent Speaker Verification

被引:2
|
作者
Wang, Shuai [1 ]
Dinkel, Heinrich [1 ]
Qian, Yanmin [1 ]
Yu, Kai [1 ]
机构
[1] Shanghai Jiao Tong Univ, Brain Sci & Technol Res Ctr, Key Lab Shanghai Educ Commiss Intelligent Interac, Dept Comp Sci & Engn,SpeechLab, Shanghai, Peoples R China
关键词
Deep features; Text-dependent speaker verification; Speaker recognition; d-vector; j-vector; Covariance discrimination;
D O I
10.1007/978-3-030-02698-1_20
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
d-vector approach achieved impressive results in speaker verification. Representation is obtained at utterance level by calculating the mean of the frame level outputs of a hidden layer of the DNN. Although mean based speaker identity representation has achieved good performance, it ignores the variability of frames across the whole utterance, which consequently leads to information loss. This is particularly serious for text-dependent speaker verification, where within-utterance feature variability better reflects text variability than the mean. To address this issue, a new covariance based speaker representation is proposed in this paper. Here, covariance of the frame level outputs is calculated and incorporated into the speaker identity representation. The proposed approach is investigated within a joint multi-task learning framework for text-dependent speaker verification. Experiments on RSR2015 and RedDots showed that, covariance based deep feature can significantly improve the performance compared to the traditional mean based deep features.
引用
收藏
页码:231 / 242
页数:12
相关论文
共 50 条
  • [1] Deep feature for text-dependent speaker verification
    Liu, Yuan
    Qian, Yanmin
    Chen, Nanxin
    Fu, Tianfan
    Zhang, Ya
    Yu, Kai
    [J]. SPEECH COMMUNICATION, 2015, 73 : 1 - 13
  • [2] Deep Embedding Learning for Text-Dependent Speaker Verification
    Zhang, Peng
    Hu, Peng
    Zhang, Xueliang
    [J]. INTERSPEECH 2020, 2020, : 3461 - 3465
  • [3] Tandem Deep Features for Text-Dependent Speaker Verification
    Fu, Tianfan
    Qian, Yanmin
    Liu, Yuan
    Yu, Kai
    [J]. 15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 1327 - 1331
  • [4] DEEP NEURAL NETWORK BASED POSTERIORS FOR TEXT-DEPENDENT SPEAKER VERIFICATION
    Dey, Subhadeep
    Madikeri, Srikanth
    Ferras, Marc
    Modicek, Petr
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5050 - 5054
  • [5] Improved Deep Speaker Feature Learning for Text-Dependent Speaker Recognition
    Li, Lantian
    Lin, Yiye
    Zhang, Zhiyong
    Wang, Dong
    [J]. 2015 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2015, : 426 - 429
  • [6] EXPLORING SEQUENTIAL CHARACTERISTICS IN SPEAKER BOTTLENECK FEATURE FOR TEXT-DEPENDENT SPEAKER VERIFICATION
    Chen, Liping
    Zhao, Yong
    Zhang, Shi-Xiong
    Li, Jie
    Ye, Guoli
    Soong, Frank
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5364 - 5368
  • [7] Text-dependent speaker verification system
    Qin, Bing
    Chen, Huipeng
    Li, Guangqi
    Liu, Songbo
    [J]. Harbin Gongye Daxue Xuebao/Journal of Harbin Institute of Technology, 2000, 32 (04): : 16 - 18
  • [8] DEEP NEURAL NETWORKS FOR SMALL FOOTPRINT TEXT-DEPENDENT SPEAKER VERIFICATION
    Variani, Ehsan
    Lei, Xin
    McDermott, Erik
    Moreno, Ignacio Lopez
    Gonzalez-Dominguez, Javier
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [9] Text-Dependent Speaker Verification System: A Review
    Debnath, Saswati
    Soni, B.
    Baruah, U.
    Sah, D. K.
    [J]. PROCEEDINGS OF 2015 IEEE 9TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS AND CONTROL (ISCO), 2015,
  • [10] ATTENTION-BASED MODELS FOR TEXT-DEPENDENT SPEAKER VERIFICATION
    Chowdhury, F. A. Rezaur Rahman
    Wang, Quan
    Moreno, Ignacio Lopez
    Wan, Li
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5359 - 5363