Vocal tract length normalization for speaker independent acoustic-to-articulatory speech inversion

被引:15
|
作者
Sivaraman, Ganesh [1 ]
Mitra, Vikramjit [2 ]
Nam, Hosung [3 ]
Tiede, Mark [4 ]
Espy-Wilson, Carol [1 ]
机构
[1] Univ Maryland, College Pk, MD 20742 USA
[2] SRI Int, 333 Ravenswood Ave, Menlo Pk, CA 94025 USA
[3] Korea Univ, Dept English Language & Literature, Seoul, South Korea
[4] Haskins Labs Inc, New Haven, CT USA
基金
美国国家科学基金会;
关键词
Acoustic to articulatory speech inversion; speaker normalization; Vocal Tract Length Normalization;
D O I
10.21437/Interspeech.2016-1399
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech inversion is a well-known ill-posed problem and addition of speaker differences typically makes it even harder. This paper investigates a vocal tract length normalization (VTLN) technique to transform the acoustic space of different speakers to a target speaker space such that speaker specific details are minimized. The speaker normalized features are then used to train a feed-forward neural network based acoustic-to articulatory speech inversion system. The acoustic features are parameterized as time-contextualized mel-frequency cepstral coefficients and the articulatory features are represented by six tract-variable (TV) trajectories. Experiments are performed with ten speakers from the U. Wisc. X-ray microbeam database. Speaker dependent speech inversion systems are trained for each speaker as baselines to compare the performance of the speaker independent approach. For each target speaker, data from the remaining nine speakers are transformed using the proposed approach and the transformed features are used to train a speech inversion system. The performances of the individual systems are compared using the correlation between the estimated and the actual TVs on the target speaker's test set. Results show that the proposed speaker normalization approach provides a 7% absolute improvement in correlation as compared to the system where speaker normalization was not performed.
引用
收藏
页码:455 / 459
页数:5
相关论文
共 50 条
  • [1] Speaker dependent acoustic-to-articulatory inversion using real-time MRI of the vocal tract
    Csapo, Tamas Gabor
    [J]. INTERSPEECH 2020, 2020, : 3720 - 3724
  • [2] Autoregressive Articulatory WaveNet Flow for Speaker-Independent Acoustic-to-Articulatory Inversion
    Bozorg, Narjes
    Johnson, Michael T.
    Soleymanpour, Mohammad
    [J]. 2021 INTERNATIONAL CONFERENCE ON SPEECH TECHNOLOGY AND HUMAN-COMPUTER DIALOGUE (SPED), 2021, : 156 - 161
  • [3] Unsupervised Acoustic-to-Articulatory Inversion with Variable Vocal Tract Anatomy
    Sun, Yifan
    Huang, Qinlong
    Wu, Xihong
    [J]. INTERSPEECH 2022, 2022, : 4656 - 4660
  • [4] Better acoustic normalization in subject independent acoustic-to-articulatory inversion: benefit to recognition
    Afshan, Amber
    Ghosh, Prasanta Kumar
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5395 - 5399
  • [5] Unsupervised Vocal-tract Length Estimation Through Model-based Acoustic-to-Articulatory Inversion
    Cai, Shanqing
    Bunnell, H. Timothy
    Patel, Rupal
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1711 - 1715
  • [6] Acoustic-to-articulatory mapping codebook constraint for determining vocal-tract length for inverse speech problem and articulatory synthesis
    Yu, ZL
    Zeng, SC
    [J]. 2000 5TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, VOLS I-III, 2000, : 827 - 830
  • [7] Unsupervised speaker adaptation for speaker independent acoustic to articulatory speech inversion
    Sivaraman, Ganesh
    Mitra, Vikramjit
    Nam, Hosung
    Tiede, Mark
    Espy-Wilson, Carol
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2019, 146 (01): : 316 - 329
  • [8] Parallel Reference Speaker Weighting for Kinematic-Independent Acoustic-to-Articulatory Inversion
    Ji, An
    Johnson, Michael T.
    Berry, Jeffrey J.
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (10) : 1865 - 1875
  • [9] A SUBJECT-INDEPENDENT ACOUSTIC-TO-ARTICULATORY INVERSION
    Ghosh, Prasanta Kumar
    Narayanan, Shrikanth S.
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4624 - 4627
  • [10] Multi-corpus Acoustic-to-articulatory Speech Inversion
    Seneviratne, Nadee
    Sivaraman, Ganesh
    Espy-Wilson, Carol
    [J]. INTERSPEECH 2019, 2019, : 859 - 863