A Unified Deep Neural Network for Speaker and Language Recognition

被引:0
|
作者
Richardson, Fred [1 ]
Reynolds, Doug [1 ]
Dehak, Najim [2 ]
机构
[1] MIT, Lincoln Lab, 244 Wood St, Lexington, MA 02173 USA
[2] MIT, CSAIL, Cambridge, MA USA
关键词
i-vector; DNN; bottleneck features; speaker recognition; language recognition;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Significant performance gains have been reported separately for speaker recognition (SR) and language recognition (LR) tasks using either DNN posteriors of sub-phonetic units or DNN feature representations, but the two techniques have not been compared on the same SR or LR task or across SR and LR tasks using the same DNN. In this work we present the application of a single DNN for both tasks using the 2013 Domain Adaptation Challenge speaker recognition (DAC13) and the NIST 2011 language recognition evaluation (LRE11) benchmarks. Using a single DNN trained on Switchboard data we demonstrate large gains in performance on both benchmarks: a 55% reduction in EER for the DAC13 out-of-domain condition and a 48% reduction in C-avg on the LRE11 30s test condition. Score fusion and feature fusion are also investigated as is the performance of the DNN technologies at short durations for SR.
引用
下载
收藏
页码:1146 / 1150
页数:5
相关论文
共 50 条
  • [1] Deep Neural Network Approaches to Speaker and Language Recognition
    Richardson, Fred
    Reynolds, Douglas
    Dehak, Najim
    IEEE SIGNAL PROCESSING LETTERS, 2015, 22 (10) : 1671 - 1675
  • [2] ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION
    McLaren, Mitchell
    Lei, Yun
    Ferrer, Luciana
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4814 - 4818
  • [3] Deep Convolutional Neural Network for Recognition of Unified Multi-Language Handwritten Numerals
    Latif, Ghazanfar
    Alghazo, Jaafar
    Alzubaidi, Loay
    Naseer, M. Muzzamal
    Alghazo, Yazan
    2018 IEEE 2ND INTERNATIONAL WORKSHOP ON ARABIC AND DERIVED SCRIPT ANALYSIS AND RECOGNITION (ASAR), 2018, : 90 - 95
  • [4] A Deep Neural Network for Short-Segment Speaker Recognition
    Hajavi, Amirhossein
    Etemad, Ali
    INTERSPEECH 2019, 2019, : 2878 - 2882
  • [5] Deep Speaker Embeddings with Convolutional Neural Network on Supervector for Text-Independent Speaker Recognition
    Cai, Danwei
    Cai, Zexin
    Li, Ming
    2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 1478 - 1482
  • [6] Insights into Deep Neural Networks for Speaker Recognition
    Garcia-Romero, Daniel
    McCree, Alan
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1141 - 1145
  • [7] Sign Language Recognition System Using Deep Neural Network
    Suresh, Surejya
    Haridas, Mithun T. P.
    Supriya, M. H.
    2019 5TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING & COMMUNICATION SYSTEMS (ICACCS), 2019, : 614 - 618
  • [8] DEEP NEURAL NETWORK TRAINED WITH SPEAKER REPRESENTATION FOR SPEAKER NORMALIZATION
    Tang, Yun
    Mohan, Aanchan
    Rose, Richard C.
    Ma, Chengyuan
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [9] A unified approach to transfer learning of deep neural networks with applications to speaker adaptation in automatic speech recognition
    School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta
    GA
    30332, United States
    不详
    Sicily, Italy
    Neurocomputing, (448-459):
  • [10] A unified approach to transfer learning of deep neural networks with applications to speaker adaptation in automatic speech recognition
    Huang, Zhen
    Siniscalchi, Sabato Marco
    Lee, Chin-Hui
    NEUROCOMPUTING, 2016, 218 : 448 - 459