IMPROVING THE PERFORMANCE OF VTLN UNDER MISMATCHED SPEAKER CONDITIONS AND MAKING IT APPROACH THAT OF MATCHED SPEAKER CONDITIONS

被引:0
|
作者
Sanand, D. R. [1 ]
Rath, S. P. [1 ]
Umesh, S. [1 ]
机构
[1] Indian Inst Technol, Dept Elect Engn, Kanpur 208016, Uttar Pradesh, India
关键词
Speaker Normalization; VTLN; Linear Transformation; Jacobian; MLLT; ADAPTATION;
D O I
10.1109/ICASSP.2009.4960604
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The performance of conventional VTLN for mis-matched train and test. speaker conditions (e.g. adult-train child-test) does not approach the performance of matched speaker conditions (e.g. child-train child-test). In this paper, we investigate this problem and propose methods to reduce this gap in performance. We use our recently proposed linear transformation approach to VTLN, that also enables us to study the effect of Jacobian unlike conventional VTLN. The main advantage of transform-based VTLN over adaptation based approaches (like CMLLR), is that it does not require any matrix estimation. We argue that the degraded VTLN performance under mismatched speaker conditions is due to the significant frequency warping that is necessary for normalization which leads to a mismatch between the correlation in the feature components of the test data and the covariance structure of the trained/normalized model. We show that the use of a global de-correlating transform (MLLT) leads to improved VTLN performance. We finally show that using both Jacobian and MLLT together improves the VTLN performance for mis-matched cases with the performance approaching that of matched speaker conditions.
引用
收藏
页码:4397 / 4400
页数:4
相关论文
共 50 条
  • [1] Synthetic Speaker Models Using VTLN to Improve the Performance of Children in Mismatched Speaker Conditions for ASR
    Sanand, D. R.
    Svendsen, T.
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 3360 - 3364
  • [2] Speaker verification under mismatched data conditions
    Pillay, S. G.
    Ariyaeeinia, A.
    Pawlewski, M.
    Sivakumaran, P.
    [J]. IET SIGNAL PROCESSING, 2009, 3 (04) : 236 - 246
  • [3] SPEAKER GENDER IDENTIFICATION IN MATCHED AND MISMATCHED CONDITIONS BASED ON STACKING ENSEMBLE METHOD
    Badr, Ameer A.
    Abdul-Hassan, Alia K.
    [J]. JOURNAL OF ENGINEERING SCIENCE AND TECHNOLOGY, 2022, 17 (02): : 1119 - 1134
  • [4] Speaker Identification and Verification from Audio Coded Speech in Matched and Mismatched Conditions
    Jiang, Tao
    Gao, Boyang
    Han, Jiqing
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND BIOMIMETICS (ROBIO 2009), VOLS 1-4, 2009, : 2199 - 2204
  • [5] An affine transform for speaker recognition enhancement under mismatched coding conditions
    AbdelSalam, A
    Fakhr, W
    Hamdy, N
    [J]. Proceedings of the 46th IEEE International Midwest Symposium on Circuits & Systems, Vols 1-3, 2003, : 621 - 624
  • [6] Robust Far-Field Speaker Identification under Mismatched Conditions
    Jin, Qin
    Schultz, Tanja
    [J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1893 - 1896
  • [7] Enhancement of mismatched conditions in speaker recognition for multimedia applications
    Fakhr, W
    AbdelSalam, A
    Hamdy, N
    [J]. 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 377 - 380
  • [8] Aural and automatic forensic speaker recognition in mismatched conditions
    Alexander, Anil
    Dessimoz, Damien
    Botti, Filippo
    Drygajlo, Andrzel
    [J]. INTERNATIONAL JOURNAL OF SPEECH LANGUAGE AND THE LAW, 2005, 12 (02) : 214 - 234
  • [9] A STUDY OF THE ROBUSTNESS OF RAW WAVEFORM BASED SPEAKER EMBEDDINGS UNDER MISMATCHED CONDITIONS
    Zhu, Ge
    Cwitkowitz, Frank
    Duan, Zhiyao
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7657 - 7661
  • [10] On compensation of mismatched recording conditions in the Bayesian approach for forensic automatic speaker recognition
    Botti, F
    Alexander, A
    Drygajlo, A
    [J]. FORENSIC SCIENCE INTERNATIONAL, 2004, 146 : S101 - S106