Vocal tract length normalization for speaker independent acoustic-to-articulatory speech inversion

被引:15
|
作者
Sivaraman, Ganesh [1 ]
Mitra, Vikramjit [2 ]
Nam, Hosung [3 ]
Tiede, Mark [4 ]
Espy-Wilson, Carol [1 ]
机构
[1] Univ Maryland, College Pk, MD 20742 USA
[2] SRI Int, 333 Ravenswood Ave, Menlo Pk, CA 94025 USA
[3] Korea Univ, Dept English Language & Literature, Seoul, South Korea
[4] Haskins Labs Inc, New Haven, CT USA
基金
美国国家科学基金会;
关键词
Acoustic to articulatory speech inversion; speaker normalization; Vocal Tract Length Normalization;
D O I
10.21437/Interspeech.2016-1399
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech inversion is a well-known ill-posed problem and addition of speaker differences typically makes it even harder. This paper investigates a vocal tract length normalization (VTLN) technique to transform the acoustic space of different speakers to a target speaker space such that speaker specific details are minimized. The speaker normalized features are then used to train a feed-forward neural network based acoustic-to articulatory speech inversion system. The acoustic features are parameterized as time-contextualized mel-frequency cepstral coefficients and the articulatory features are represented by six tract-variable (TV) trajectories. Experiments are performed with ten speakers from the U. Wisc. X-ray microbeam database. Speaker dependent speech inversion systems are trained for each speaker as baselines to compare the performance of the speaker independent approach. For each target speaker, data from the remaining nine speakers are transformed using the proposed approach and the transformed features are used to train a speech inversion system. The performances of the individual systems are compared using the correlation between the estimated and the actual TVs on the target speaker's test set. Results show that the proposed speaker normalization approach provides a 7% absolute improvement in correlation as compared to the system where speaker normalization was not performed.
引用
收藏
页码:455 / 459
页数:5
相关论文
共 50 条
  • [21] Acoustic-to-Articulatory Mapping With Joint Optimization of Deep Speech Enhancement and Articulatory Inversion Models
    Shahrebabaki, Abdolreza Sabzi
    Salvi, Giampiero
    Svendsen, Torbjorn
    Siniscalchi, Sabato Marco
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 135 - 147
  • [22] The impact of cross language on acoustic-to-articulatory inversion and its influence on articulatory speech synthesis
    Illa, Aravind
    Nair, Aanish
    Ghosh, Prasanta Kumar
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8267 - 8271
  • [23] Formant Trajectories for Acoustic-to-Articulatory Inversion
    Ozbek, I. Yuecel
    Hasegawa-Johnson, Mark
    Demirekler, Muebeccel
    [J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2783 - +
  • [24] Analysis of acoustic-to-articulatory speech inversion across different accents and languages
    Sivaraman, Ganesh
    Espy-Wilson, Carol
    Wieling, Martijn
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 974 - 978
  • [25] ACOUSTIC-TO-ARTICULATORY INVERSION FOR DYSARTHRIC SPEECH BY USING CROSS-CORPUS ACOUSTIC-ARTICULATORY DATA
    Maharana, Sarthak Kumar
    Illa, Aravind
    Mannem, Renuka
    Belur, Yamini
    Shetty, Preetie
    Kumar, Veeramani Preethish
    Vengalil, Seena
    Polavarapu, Kiran
    Atchayaram, Nalini
    Ghosh, Prasanta Kumar
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6458 - 6462
  • [26] MLLR-PRSW for Kinematic-Independent Acoustic-to-Articulatory Inversion
    Bozorg, Narjes
    Johnson, Michael T.
    [J]. 2019 IEEE 19TH INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY (ISSPIT 2019), 2019,
  • [27] NORMALIZING THE VOCAL-TRACT LENGTH FOR SPEAKER-INDEPENDENT SPEECH RECOGNITION
    LIN, QG
    CHE, CW
    [J]. IEEE SIGNAL PROCESSING LETTERS, 1995, 2 (11) : 201 - 203
  • [28] Incorporation of phonetic constraints in acoustic-to-articulatory inversion
    Potard, Blaise
    Laprie, Yves
    Ouni, Slim
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2008, 123 (04): : 2310 - 2323
  • [29] A DEEP RECURRENT APPROACH FOR ACOUSTIC-TO-ARTICULATORY INVERSION
    Liu, Peng
    Yu, Quanjie
    Wu, Zhiyong
    Kang, Shiyin
    Meng, Helen
    Cai, Lainhong
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4450 - 4454
  • [30] A generalized smoothness criterion for acoustic-to-articulatory inversion
    Ghosh, Prasanta Kumar
    Narayanan, Shrikanth
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2010, 128 (04): : 2162 - 2172