IMPROVING THE PERFORMANCE OF VTLN UNDER MISMATCHED SPEAKER CONDITIONS AND MAKING IT APPROACH THAT OF MATCHED SPEAKER CONDITIONS

被引:0
|
作者
Sanand, D. R. [1 ]
Rath, S. P. [1 ]
Umesh, S. [1 ]
机构
[1] Indian Inst Technol, Dept Elect Engn, Kanpur 208016, Uttar Pradesh, India
关键词
Speaker Normalization; VTLN; Linear Transformation; Jacobian; MLLT; ADAPTATION;
D O I
10.1109/ICASSP.2009.4960604
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The performance of conventional VTLN for mis-matched train and test. speaker conditions (e.g. adult-train child-test) does not approach the performance of matched speaker conditions (e.g. child-train child-test). In this paper, we investigate this problem and propose methods to reduce this gap in performance. We use our recently proposed linear transformation approach to VTLN, that also enables us to study the effect of Jacobian unlike conventional VTLN. The main advantage of transform-based VTLN over adaptation based approaches (like CMLLR), is that it does not require any matrix estimation. We argue that the degraded VTLN performance under mismatched speaker conditions is due to the significant frequency warping that is necessary for normalization which leads to a mismatch between the correlation in the feature components of the test data and the covariance structure of the trained/normalized model. We show that the use of a global de-correlating transform (MLLT) leads to improved VTLN performance. We finally show that using both Jacobian and MLLT together improves the VTLN performance for mis-matched cases with the performance approaching that of matched speaker conditions.
引用
收藏
页码:4397 / 4400
页数:4
相关论文
共 50 条
  • [21] Automatic Speaker Recognition performance with matched and mismatched female bilingual speech data
    Nuttall, Bryony
    Harrison, Philip
    Hughes, Vincent
    [J]. INTERSPEECH 2023, 2023, : 601 - 605
  • [22] The effect of mismatched recording conditions on human and automatic speaker recognition in forensic applications
    Alexander, A
    Botti, F
    Dessimoz, D
    Drygajlo, A
    [J]. FORENSIC SCIENCE INTERNATIONAL, 2004, 146 : S95 - S99
  • [23] Finding Speaker Position Under Difficult Acoustic Conditions
    Shuranov, Evgeniy
    Lavrentyev, Aleksandr
    Kozlyaev, Alexey
    Lavrentyeva, Galina
    Volkovaya, Valeriya
    [J]. SPEECH AND COMPUTER, 2016, 9811 : 321 - 327
  • [24] A speaker verification backend with robust performance across conditions
    Ferrer, Luciana
    McLaren, Mitchell
    Brummer, Niko
    [J]. COMPUTER SPEECH AND LANGUAGE, 2022, 71
  • [25] Speaker recognition in adverse conditions
    Iyer, Ananth N.
    Ofoegbu, Uchechukwu O.
    Yantorno, Robert E.
    Wenndt, Stanley J.
    [J]. 2007 IEEE AEROSPACE CONFERENCE, VOLS 1-9, 2007, : 1547 - 1554
  • [26] IMPROVING PERFORMANCE OF SEMINAR SPEAKER
    SEIFERT, MH
    SMITH, JE
    [J]. JOURNAL OF MEDICAL EDUCATION, 1974, 49 (06): : 615 - 616
  • [27] Improving the Performance of Speaker Verification Systems under Noisy Conditions using Low Level Features and Score Level Fusion
    Asbai, Nassim
    Bengherabi, Messaoud
    Harizi, Farid
    Amrouche, Abderrahmane
    [J]. PROCEEDINGS OF THE 10TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND MULTIMEDIA APPLICATIONS (SIGMAP 2013), 2013, : 30 - 35
  • [28] Speaker discrimination performance for easy versus hard voices in style-matched and -mismatched speech
    Afshan, Amber
    Kreiman, Jody
    Alwan, Abeer
    [J]. Journal of the Acoustical Society of America, 2022, 151 (02): : 1393 - 1403
  • [29] Speaker discrimination performance for "easy" versus "hard" voices in style-matched and -mismatched speech
    Afshan, Amber
    Kreiman, Jody
    Alwan, Abeer
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2022, 151 (02): : 1393 - 1403
  • [30] Effect of Nonlinear Compression Function on the Performance of the Speaker Identification System under Noisy Conditions
    Jawarkar, Naresh P.
    Holambe, Raghunath S.
    Basu, Tapan Kumar
    [J]. PERCEPTION AND MACHINE INTELLIGENCE, 2015, 2015, : 137 - 144