Improved Deep Speaker Feature Learning for Text-Dependent Speaker Recognition

被引:0
|
作者
Li, Lantian [1 ]
Lin, Yiye
Zhang, Zhiyong
Wang, Dong
机构
[1] Tsinghua Univ, Ctr Speech & Language Technol, Div Tech Innovat & Dev, Tsinghua Natl Lab Informat Sci & Technol, Beijing, Peoples R China
关键词
d-vector; time dynamic warping; speaker recognition; VERIFICATION;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
A deep learning approach has been proposed recently to derive speaker identifies (d-vector) by a deep neural network (DNN). This approach has been applied to text-dependent speaker recognition tasks and shows reasonable performance gains when combined with the conventional i-vector approach. Although promising, the existing d-vector implementation still can not compete with the i-vector baseline. This paper presents two improvements for the deep learning approach: a phone-dependent DNN structure to normalize phone variation, and a new scoring approach based on dynamic time warping (DTW). Experiments on a text-dependent speaker recognition task demonstrated that the proposed methods can provide considerable performance improvement over the existing d-vector implementation.
引用
收藏
页码:426 / 429
页数:4
相关论文
共 50 条
  • [1] Deep feature for text-dependent speaker verification
    Liu, Yuan
    Qian, Yanmin
    Chen, Nanxin
    Fu, Tianfan
    Zhang, Ya
    Yu, Kai
    [J]. SPEECH COMMUNICATION, 2015, 73 : 1 - 13
  • [2] Speaker and Channel Factors in Text-Dependent Speaker Recognition
    Stafylakis, Themos
    Kenny, Patrick
    Alam, Md. Jahangir
    Kockmann, Marcel
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (01) : 65 - 78
  • [3] Deep Embedding Learning for Text-Dependent Speaker Verification
    Zhang, Peng
    Hu, Peng
    Zhang, Xueliang
    [J]. INTERSPEECH 2020, 2020, : 3461 - 3465
  • [4] Text-dependent Speaker Recognition for Vietnamese
    Diep Dao Thi Thu
    Quang Nguyen Hong
    Loan Trinh Van
    Hung Pham Ngoc
    [J]. 2013 INTERNATIONAL CONFERENCE OF SOFT COMPUTING AND PATTERN RECOGNITION (SOCPAR), 2013, : 196 - 200
  • [5] Covariance Based Deep Feature for Text-Dependent Speaker Verification
    Wang, Shuai
    Dinkel, Heinrich
    Qian, Yanmin
    Yu, Kai
    [J]. INTELLIGENCE SCIENCE AND BIG DATA ENGINEERING, 2018, 11266 : 231 - 242
  • [6] Text-dependent speaker recognition using speaker specific compensation
    Laxman, S
    Sastry, PS
    [J]. IEEE TENCON 2003: CONFERENCE ON CONVERGENT TECHNOLOGIES FOR THE ASIA-PACIFIC REGION, VOLS 1-4, 2003, : 384 - 387
  • [7] MULTIPLE TEMPORAL SCALES BASED SPEAKER EMBEDDINGS LEARNING FOR TEXT-DEPENDENT SPEAKER RECOGNITION
    Wang, Wenchao
    Zhang, Yike
    Xu, Ji
    Yan, Yonghong
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6311 - 6315
  • [8] A Text-dependent Speaker-Recognition System
    Ishac, Dany
    Abche, Antoine
    Karam, Elie
    Nassar, Georges
    Callens, Dorothee
    [J]. 2017 IEEE INTERNATIONAL INSTRUMENTATION AND MEASUREMENT TECHNOLOGY CONFERENCE (I2MTC), 2017, : 147 - 152
  • [9] Improving Text-Dependent Speaker Recognition Performance
    Impedovo, Donato
    Refice, Mario
    [J]. TOOLS AND APPLICATIONS WITH ARTIFICIAL INTELLIGENCE, 2009, 166 : 199 - 211
  • [10] An educational text-dependent speaker recognition system
    Ibrahim, Dogan
    [J]. INTERNATIONAL JOURNAL OF ELECTRICAL ENGINEERING EDUCATION, 2012, 49 (01) : 61 - 73