On the Issue of Calibration in DNN-based Speaker Recognition Systems

被引:5
|
作者
McLaren, Mitchell [1 ]
Castan, Diego [1 ]
Ferrer, Luciana [2 ,3 ]
Lawson, Aaron [1 ]
机构
[1] SRI Int, Speech Technol & Res Lab, 333 Ravenswood Ave, Menlo Pk, CA 94025 USA
[2] Univ Buenos Aires, FCEN, Dept Comp, Buenos Aires, DF, Argentina
[3] Consejo Nacl Invest Cient & Tecn, Buenos Aires, DF, Argentina
关键词
speaker recognition; mismatch; calibration; deep neural network; bottleneck features;
D O I
10.21437/Interspeech.2016-1134
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This article is concerned with the issue of calibration in the context of Deep Neural Network (DNN) based approaches to speaker recognition. DNNs have provided a new standard in technology when used in place of the traditional universal background model (UBM) for feature alignment, or to augment traditional features with those extracted from a bottleneck layer of the DNN. These techniques provide extremely good performance for constrained trial conditions that are well matched to development conditions. However, when applied to unseen conditions or a wide variety of conditions, some DNN-based techniques offer poor calibration performance. Through analysis on both PRISM and the recently released Speakers in the Wild (SITW) corpora, we illustrate that bottleneck features hinder calibration if used in the calculation of first-order Baum Welch statistics during i-vector extraction. We propose a hybrid alignment framework, which stems from our previous work in DNN senone alignment, that uses the bottleneck features only for the alignment of features during statistics calculation. This framework not only addresses the issue of calibration, but provides a more computationally efficient system based on bottleneck features with improved discriminative power.
引用
收藏
页码:1825 / 1829
页数:5
相关论文
共 50 条
  • [21] Evaluating and Improving Adversarial Attacks on DNN-Based Modulation Recognition
    Zhao, Haojun
    Lin, Yun
    Gao, Song
    Yu, Shui
    [J]. 2020 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2020,
  • [22] Unsupervised Speaker Adaptation for DNN-based Speech Synthesis using Input Codes
    Takaki, Shinji
    Nishimura, Yoshikazu
    Yamagishi, Junichi
    [J]. 2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 649 - 658
  • [23] Speaker adaptation in DNN-based speech synthesis using d-vectors
    Doddipatla, Rama
    Braunschweiler, Norbert
    Maia, Ranniery
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3404 - 3408
  • [24] Scores Calibration in Speaker Recognition Systems
    Shulipa, Andrey
    Novoselov, Sergey
    Matveev, Yuri
    [J]. SPEECH AND COMPUTER, 2016, 9811 : 596 - 603
  • [25] DronePaint: Swarm Light Painting with DNN-based Gesture Recognition
    Serpiva, Valerii
    Karmanova, Ekaterina
    Fedoseev, Aleksey
    Perminov, Stepan
    Tsetserukou, Dzmitry
    [J]. SIGGRAPH '21: ACM SIGGRAPH 2021 EMERGING TECHNOLOGIES, 2021,
  • [26] ENVIRONMENT AWARE SPEAKER DIARIZATION FOR MOVING TARGETS USING PARALLEL DNN-BASED RECOGNIZERS
    Najafian, Maryam
    Hansen, John H. L.
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5450 - 5454
  • [27] On Parameter Adaptation in Softmax-based Cross-Entropy Loss for Improved Convergence Speed and Accuracy in DNN-based Speaker Recognition
    Rybicka, Magdalena
    Kowalczyk, Konrad
    [J]. INTERSPEECH 2020, 2020, : 3805 - 3809
  • [28] Speaker verification using short utterances with DNN-based estimation of subglottal acoustic features
    Guo, Jinxi
    Yeung, Gary
    Muralidharan, Deepak
    Arsikere, Harish
    Afshan, Amber
    Alwan, Abeer
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2219 - 2222
  • [29] INVESTIGATING DOMAIN SENSITIVITY OF DNN EMBEDDINGS FOR SPEAKER RECOGNITION SYSTEMS
    Rahman, Md Hafizur
    Himawan, Ivan
    Sridharan, Sridha
    Fookes, Clinton
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5811 - 5815
  • [30] DNN-BASED EMOTION RECOGNITION BASED ON BOTTLENECK ACOUSTIC FEATURES AND LEXICAL FEATURES
    Kim, Eesung
    Shin, Jong Won
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6720 - 6724